Sometimes rules don't get executed?

Yes, I think so. It’s all rather odd. Keep a lookout for if the “10 minute ticker” rule misses its expected time, that may help reveal the nature of the logjam.

See if you can figure out which is the first rule to appear when the “catching up” happens - it may be something about that one or its triggering that is the logjam.

What kind of host are you running on ? - I’m wondering if there is an external cause, like having trouble fetching rules file from an SD card or something.

I’m using a RPi 3B with openHabian on it. The next I will try I think is to move my whole setup to another RPi and reserve it just for that (now I have some other small scripts running on this RPi as well - perfomance doesn’t seems an issue, but might cause some problem…).

The timer seems to get executed every time, on time (10-150ms after the 10min, so it is really good).
Two rules which seems to “catch up” first:

  • The one I posted here in my original post.
  • I have another rule which takes every phone’s location and uses the Google API to return a postal address (this is good to have, because you know where that phone is without zooming and looking at a map). However his rule has a big, 20sec timeout specified because unfortunately it won’t return anything if I reduce it to, lets say 10sec… I don’t know why, because execution is much faster… But maybe this can cause issues. I only have 4 devices which location is transformed with this rule and I have increased the ruleEngine execution threads to 7, so always it should have at least 3 free thread (the location is polled - with iCloud binding and Owntracks - so it shouldn’t be a problem if I’m correct, because the rule execution is max. 20 secs and the poll time is 10 min…).

Ps.: I have another rule which calculates if the phone is Home or Away. Usually it is ran together with my LocationToString (that’s how I call the rule above, which translates the Location to Postal code) and can it cause any problem if it wants to use 8 thread (4 * 2) and only 7 available and maybe other rules triggered?

Another approach is I’m thinking of, that I remove some rules - maybe this one - and see if it makes any difference. I really don’t know what other options I have…

Did you create some kind of recursion/circular dependency of rules (or items you trigger upon) on each other ?

You can increase the number of threads (org.eclipse.smarthome.threadpool:ruleEngine=20
in runtime.cfg) if you believe that to be an issue (it shouldn’t under normal conditions)

Write performance can also be causing this (if you still write logs to SD card which you shouldn’t be doing).

I have eliminated circular calls, because I had problems with that as well in other rules.
Unfortunately I’m writing the logs to the SD… I also thought of redirecting the var/log to RAM but I didn’t had time for that. Or are there any better solution for this?

You mean because the slow SD card (but as far as I can remember it is not a bad card at all - or it was good when it was new) can cause not executing rules? Because not only the logs are not there, the rule are not executed as well - or it stops executing because it can’t read/write files?

Well repetitive writes to SD can queue up and block threads that are then unavailable to execute rules, slowing everything down. And logging+swapping can corrupt an SD fast so there can even be a problem with a fairly new card. Yes there’s better solutions, tmpfs being one, separate USB or NAS storage are others.

I have a NAS, do you have any experience redirecting the whole var/log folder to it, how will it do performance-wise? Both my RPi and NAS is connected via Ethernet, but my NAS is an older cheap D-Link…

What is a better approach for now? tmpfs or nas?

I don’t like the sound of this, potential multiple copies of the rule hanging up awaiting Google responses.
It rather depends on the triggering conditions to “guarantee” only one run per device.
I wonder how Google reacts if you bang off multiple requests before it responds.
At the least, I would implement some locking so that only one Google lookup is in progress at a time.

Though it sounds like it should simple enough to temporarily omit the lookup in this rule, for elimination purposes.

Yes that’s what I’m afraid of, that maybe this can cause some problem… I have tried today to implement a lock, but I couldn’t find a quick solution which seems to work…

NAS. See this post.
Now while I don’t know if it is to help with your specific rules problem, for sure it’s a good idea to setup your system for less writes, with a UPS and backup in place.

Thanks, I’ll read through that. Hope that fixing my rules and moving logging to somewhere else, would make sense and this “unsolvable” error will be gone…

I was able to set-up the lock… hope it helps. However I have a problem with this - if I’m right. That usually all items get updated together, so only the first item will execute the rule, the others will return because the lock is locked. Is it somehow possible to limit a rule to only one thread but to queue up other threads? So it will get executed one after another?

Thanks!

That depends how you use the lock. You can code to use the lock for queue purposes instead. mylockvariable.lock() will queue for the lock. Perhaps your code tests the lock state first, and aborts if already locked?

Note - that won’t ease your rule thread consumption, but it will manage your Google lookups one at a time.

Think I find a way to do it. Any improvements is welcomed:

import java.util.concurrent.locks.ReentrantLock

val ReentrantLock stompingLock = new ReentrantLock()

// Google API key
val String GoogleAPIKey = "API-KEY"

rule "Location to String - Google"
  when
    Member of gPhoneLocation changed
  then
    try {
      stompingLock.lock()
      val PointType location = (triggeringItem.state) as PointType
      val nameParts = triggeringItem.name.toString.split("_")
      val triggeringiPhoneName = nameParts.get(0) + "_LocationString"
      val Latitude = location.latitude
      val Longitude = location.longitude
    
      // Building the GeoCodeURL
      val geocodeURL = "https://maps.googleapis.com/maps/api/geocode/json?latlng=" + Latitude + "," + Longitude + "&sensor=true&key=" + GoogleAPIKey
      var String geocodeResponse

      // Trying to get the location
      try {
        geocodeResponse = sendHttpGetRequest(geocodeURL, 10000)
      }
      catch(Exception e) {
        logWarn("LocationToString.rules", "Received timeout exception, skipping item update ->" + gecodeResponse)
      }

      // Formatting the address
      val String formattedAddress = transform("JSONPATH", "$.results[0].formatted_address", geocodeResponse)
      logInfo("LocationToString.rules", nameParts.get(0) + " -> " + formattedAddress)

      // Skipping item update if address is null, replacing it with the old data indicating it with a '!'
      if(formattedAddress !== null) {
        postUpdate(triggeringiPhoneName, formattedAddress)
      }
      else {
        val locationItem = gPhoneLocationString.members.findFirst[i | i.name == triggeringiPhoneName]
        val String oldFormatAddress = "! " + locationItem.state
        postUpdate(triggeringiPhoneName, oldFormatAddress)
      }
    }
    catch(Throwable t) {
        logError("LocationToString.rules", "Lock error")
    }
    finally {
      stompingLock.unlock()
    }
end

Ps.: I have moved the /var/log folder to my NAS, waiting for rules to stop…

In all likelihood this is going to make it much worse. Each Rule waiting to be granted the lock will be consuming a Rule execution thread.

The lock will only help matters if you skip the Rule if there is an instance of the Rule already running.

Only lock the bare minimum lines of code. If the only line that needs to be locked is the sendHttpGetRequest, only lock that line. You want the locked code to run as fast as possible.

You may want to move this code outside of OH and into an external program. For example, send an MQTT message when gPhoneLocation changed with the info, let an external Python program issue the call to googleapis and then send an MQTT message or make a REST API call back to OH to update the Items. Then all the waiting for Google to respond, the locking and all that is handled outside of OH and not consuming rules execution threads.

Thanks for your response!

I think I will give it a go and make a python script… So I should create an MQTT item (or as much as device I have) which sends the location to the python script and the python script publishes the state to the corresponding item

You could get some inspiration or come code from here. If you build an HTTP query sensor add-on and submit a PR I’ll be sure to accept it. :wink:

Thanks for the link :slight_smile: I have never done anything with MQTT (besides using it…). I’ll try to do something which can be used in general.
However I think it is a little bit complicated, because you have to build the http request, which is depends on the service what you use. (Or you thought about sending the built URL over MQTT and just execute the HTTP request in python?)

Anyway, I had tried my setup with the log written to my NAS. It seemed a lot better, but I messed up other things, so I had to revert it. I’ll give it another go when I have time, but I think this will solve my main problem.

The advantage of sensorReporter as a starting point is the MQTT part is already handled for you. You just need to write the code to process the message, make the HTTP call, and publish the result. Look at the other sensors for examples of how to do all that.

Either send the URL as the message or have the base URL be part of the sensor configuration. Either seem reasonable. I use the former approach in the execActuator.

Yes, since then I had a look at the code :slight_smile:
I will do it as soon as I can, test it in my setup and issue a PR

It all depends where the logjam is happening, we don’t know this a rules thread limit issue yet. The suggestion to use a lock was to neatly serialize requests fired at Google, for fear of what a rapid burst of HTTP requests would mess up.
The Google end probably throttles things if you blitz it, for example.

A proper asynchronous HTTP call for rules/Items would be a real blessing.

It might need some additional (perhaps optional) cleverness to enforce one-at-a-time query-response working with a specified target.

1 Like