[SOLVED] Error: Rule '<rulename>': null

I have tried doing that in my code but it has not helped - I think this has to do with the way we extract the item from the group.

postUpdate(actitem.name.split("_").get(0) + "_DIFFTEMP", diff)

I will try this version on mine and report back.

Hopefully we are getting somewhere!

Thanks

No, it’s a fact defined by my devices. The don’t publish the temperature in miliseconds, they publish the current temperature sometimes between 5 and 10 minutes. (IIRC roughly every 8 minutes).

I will make the change regarding the elimination of the findfirst for diffitem, this is the most promising change in my opinion. Thanks. Let’s see what happens.

Edit:
This will be difficult from now on. This error occured only twice after the rule change (on 2018-06-08). The rate went down from three times a day to once every three days. :wink:

Moving from members.findFirst to postUpdate(x,y) seems to have worked - no errors in the last 24 hours!

Having said that, I implemented this change in a fairly simple rule where I directly update the item I fetched from the group without any state checks. Most of my other rules include code that checks an item’s state before doing something - @rlkoshak, do you have any alternative methods to replace a piece of code like below?

val distanceName = userCode + "_DistanceFromHome"
val distanceItem = gDistanceFromHome.members.findFirst[ t|t.name == distanceName ]
if (distanceItem.state < 10000) {
  //do_something 
}

Thanks

val distanceItem = gDistanceFromHome.members.findFirst[ t|t.name == distanceName && t.state < 10000 ]
if (distanceItem !== null) {
    //do stuff
}

yes, thanks, but we are trying to avoid using

gDistanceFromHome.members.findFirst[x]

altogether as we suspect that is what’s causing the null errors…

Afraid not. If you need the state of the Item you found you must have a reference to the Item and the only way I know how to do that is through the members of a Group. Well, this isn’t exactly true. You could get the state of the Item by using the sendHttpGetRequest Action and the OH REST API, but that is going to be much slower and potentially introduce additional errors.

The true solution will be to figure out why we can’t access the members of a Group sometimes. Unfortunately that is going to be hard to reproduce so it might take some time.

Are you seeing the periodic error in these other Rules too? I doubt you have dozens of members of gDistanceFromHome so if our guess is right the problem should almost never occur on this Rule because the Group isn’t changing all that often.

I am seeing the error on all rules where I use members.findFirst (I was using members.filter before and I was getting the same error). I get it most frequently on one of my rules which executes every 20sec on a group with 3 members (Unifi binding updating client’s uptime), so that leads me to think it’s more about the frequency of rule execution rather than size of the group. I have left the rest of my rules unchanged and added logging so I am waiting to get the error and see what the logs say.

To be honest I expect other people are having the same problem but just have not noticed it - I only noticed it because one day I woke up to a cold shower, turns out my rule to update target temperatures on my 8 zones + hot water tank failed with the error that morning. If you get the error in a non-critical rule which executes frequently then there is no real impact, your code gets executed correctly the next time it runs.

I would love to get to the bottom of this and understand the root cause, but in the worst worst case, I can always go back to repeating the code and not using Groups at all if it means I get 100% execution!

Thanks

It really is more about how close together the Items are updating. If all three Items update at nearly the same time then our guess as to the cause of the problem would be supported. If the updates are spread out then something else is going on.

I’m sure others are seeing the error but I’m also certain that the error is quite rare. I have a dozen rules that use findFirst and filter on Groups that are constantly being updated and I’ve never seen this error.

I can’t remember if you’ve mentioned which version of OH you are running. 2.3 release? A 2.4 snapshot?

Most of the heavy users of OH frequently watch their logs for errors. Someone even wrote a Logreader Binding to get alerts when errors appear in their openhab.log. If the error were common I would have expected one of the users with a big deployment like sihui or sipvoip to have reported it already.

That sounds about right, the Unifi binding fires up every 20sec and updates all Uptime items at the same msec. My heating rule is cron based so it updates setpoints at the same time as well.

Sorry I forgot to mention, I am on 2.4 snapshot.

Let’s wait for the errors and see what the logs say. Thanks a lot for your guidance Rich!

I tend to look for outside-the-box solutions to problems. It sounds like you do have some control over the heating. Is there a way you can stagger the updates to the individual Items a little using a Thread::sleep or wait (if it is a shell script) or something like that? You don’t really have control over the Unifi binding but you do for the other.

What happens if you add a Thread::sleep(100) just before the findFirst lines for the Unifi binding Rule? If the problem is related to the Group updating in the background while the Rule is running, the sleep should give that enough time to finish before the Rule runs. It will add some latency (up to 600 msecs) to when the updates occur and when the rules execute for the last set of Items in the 30 temp updates case (only five can run at a given time).

On second thought that probably won’t work.

What if you deploy Design Pattern: Gate Keeper for that Rule? This will make the latency be even longer but what it will let us do is add a sleep after updating the member of gDiffTemp Items to give gDiffTemp a change to settle before the next Rule instance is run. You don’t need a proxy Item like described in the DP, just keep the same Rule trigger.

That’s right, at certain cron triggers I do the following:

gCHSchedule.members.forEach[ item|item.postUpdate(schedule) ]

I am guessing .forEach is a simple loop so the below would help?

gCHSchedule.members.forEach[ 
 item|item.postUpdate(schedule)
 Thread::sleep(100)]

In the beginning I had thought this was a race condition related issue so I added Thread::sleep(100) at every other line, but that did not help.

So would that be just a reentrant lock? I was thinking of experimenting with those today.

Yep. This is exactly what I’m talking about. Experiment with the sleep time to find the smallest value that works reliably.

I usually do the spacing a bit different but I think it won’t matter functionally.

gCHSchedule.members.forEach[ item |
    item.postUpdate(schedule)
    Thread::sleep(100)
]

The DP is more than just a reentrant lock. The reentrant lock provides the locking mechanism to prevent more than one instance of the Rule from actively running at the same time. But what you do inside the locked part of the Rule is also important. In this case you will want to capture the all the states that you can before the lock but not run the findFirst until after acquiring the lock. Then you would add a small Thread::sleep at the end just before unlocking to give the Item Registry a chance to finish updating the Group.

I’m always hesitant to suggest this approach because, in the 30 Item example, if they all update at the same time and you have a sleep of 100 msecs it will take 2.5 seconds before ANY other Rules will be able to run. So if you have other Rules in your system where a 2.5 second delay between the Rule trigger and the Rule execution occurs you will have only traded one problem for another.

There is a way to increase the Thread pool (I think it is one of the files in userdata/etc) so if you do go down the gatekeeper route, you will probably want to increase that from 5 to 35. It will increase the amount of required RAM significantly so that might not be a good option if you are on an RPi. You will have to experiment to find the largest number of threads in the pool that will fit in your available RAM.

You will also have to go and change it back everytime you upgrade OH.

Got it. Slapped a Thread::sleep(50) on there so let’s see how that works.

Will add the locks tonight. I don’t think I have massive groups - the largest is about 10 items, so there shouldn’t be a massive timing impact.

I’ve upped it to 10 for now. OH2 is running on a server with 32g memory so I doubt it will be a bottleneck.

3 days with the gatekeeper and no null errors on any of my rules so far! Keeping fingers crossed but this might be the solution for anyone else having the same problem. Thanks @rlkoshak!

1 Like

quick follow-up @rlkoshak : just realized I am using one global lock for all my rules triggered via “Member of” feature, and you use a hash map of locks in your Generic is Alive design pattern. I haven’t seen any side effects of using one lock yet but is there an apparent advantage of using an individual lock per trigger item?

If you use an individual lock per item then the rule will only be locked for that one item. For example, if item A triggers the rule and then item B triggers the rule:

  • 1 lock for the rule: the rule triggered on Item B will have to wait for the rule triggered on Item A to complete before it can run.

  • 1 lock her Item - the rule triggered for Item B will run in parallel with the rule triggered for Item A.

So, if you have one trigger for the rule the risk that events start to back up is higher and could become a problem. However, sometimes this is the behavior you are after (i.e you only want one instance if the rule running at a time no matter what item triggered it.)

And if you use one global lock for all your locked rules, that would lead to even worse delays as all the rules will be waiting for that one lock to be released?

So would I be right in thinking that we need one lock per rule in order to solve our null error here - since it was two or more different items being triggered very close to each other that brought about the errors?

I was thinking of applying the below code - does that look right to you?

var Map<String, ReentrantLock> locks = new HashMap

rule "A"
when
  Item x changed
then
   val ruleName = "ruleA"
   if (locks.get(ruleName) === null) {locks.put(ruleName, new ReentrantLock)}
   val lock = locks.get(ruleName)
   lock.lock
   try {}
   catch {}
   finally {lock.unlock}
end

rule "B"
when
  Item y changed
then
   val ruleName = "ruleB"
   if (locks.get(ruleName) === null) {locks.put(ruleName, new ReentrantLock)}
   val lock = locks.get(ruleName)
   lock.lock
   try {}
   catch {}
   finally {lock.unlock}
end```

It can be even worse. You only get 5 rules threads. If you have five or more rules waiting on one lock, then no rules will run, even those who don’t use the lock.

It depends on what you are trying to accomplish with the lock. You really should be using locks sparingly and only as a last resort.

  1. One rule or set of rules should not run at the same time to process events from the same Item at the same time then you need one lock per item.

  2. One rule should never have more than one instance running at the same time regardless of the event that triggered the rule.

  3. Only one rule from a set of rules should run at the same time regardless of the event that triggered the rule.

All three are valid use cases and all three require a different way of using locks. Your example code address 2. Your original code with only one lock addresses 3. But whether they is correct or not depends on what you are trying to accomplish.

Makes sense - I’m going for scenario 2 in this case. Thanks!

I am running into the same problem and it would be great if you could post you final solution as a code exmaple. I guess more people could find that useful.