"Thread-12472" Id=21452 in BLOCKED on lock=org.apache.commons.exec.Watchdog@1b193a1
owned by WATCHDOG Id=21455
at org.apache.commons.exec.Watchdog.stop(Watchdog.java:67)
at org.apache.commons.exec.ExecuteWatchdog.stop(ExecuteWatchdog.java:122)
- locked org.apache.commons.exec.ExecuteWatchdog@1780e91
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:360)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:46)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:188)
"WATCHDOG" Id=21455 in BLOCKED on lock=org.apache.commons.exec.ExecuteWatchdog@1780e91
owned by Thread-12472 Id=21452
at org.apache.commons.exec.ExecuteWatchdog.timeoutOccured(ExecuteWatchdog.java:144)
at org.apache.commons.exec.Watchdog.fireTimeoutOccured(Watchdog.java:55)
at org.apache.commons.exec.Watchdog.run(Watchdog.java:81)
- locked org.apache.commons.exec.Watchdog@1b193a1
at java.lang.Thread.run(Thread.java:748)
my question is how can identify what is causing this problem is there a way to know for example where the Thread-12472" Id=21452 is initiated from
I do not know if the BLOCKED threads 12472 and 43587 are indeed caused by a rule. I may not have any rules running but I have BLOCKED threads that remain blocked and they increase day by day . I deleted almost all of my rules and for the remaining rules I have to press a button to start them
my question is more simple. Is there a way in openhab to identify where the thread 12472 and 43587 started from ?
Read down on the thread that Vincent posted. there are various things you can do to figure out the state of the thread pools.
But based on your description you have:
long running Rules (calls to Thread::sleep, executeCommandLine, sendHttp*Request, etc.)
locked Rules (ReentrantLocks)
But ultimately, unless you know programming and can use a Java profiler, there really isn’t much you can do to identify the what started a thread based on the thread ID, or if there was it wouldn’t be all that meaningful to you anyway. It’s not going to tell you what Rule is stuck which is really what you want to know.
I would like to thanks the community for all the support. What I did was to rewrite and streamline all of my rules. I discovered that I had a rule for a movement sensor that was starting a timer and that timer was executing a file with a return but the brief moment when the movement was retriggered that timer was canceled, therefore I had a hanging thread
Even after that, the problem remained. After making a clean install on another raspberry pi and movng all of the rules there, then the problem stopped.
I am happy that the problem is solved. I still do not know what was causing the problem. I believe that it was a combination of a bad rule with something else
@rlkoshak Thanks your posts were very helpful in learning openhab
Just to be clear, you had a call to executeCommandLine in a Timer and the Timer was cancelled while the executeCommandLine was already running? And therefore the executeCommandLine stuck around consuming a thread?
I’m just trying to understand the exact behavior you are describing. If you can, posting the Rule would be useful.
rule "Kitchen Sensor"
when
Item Kitchen_Sensor_Occ changed
then
if (Settings_Kitchen_Sensor_Disable.state != ON && Settings_Kitchen_Sensor_Time.state !== null && Settings_Kitchen_Sensor_Lux.state !== null) {
if (Kitchen_Sensor_Time !== null){
Kitchen_Sensor_Time.cancel
Kitchen_Sensor_Time = null
}
if (Kitchen_Sensor_Occ.state == ON) {
if (Kitchen_Sensor_Il.state < Settings_Kitchen_Sensor_Lux.state || Kitchen_Sensor_Il.state === null) {
if (Settings_Kitchen_Sensor_Switch_On_Light1.state == ON){
Kitchen_Light.sendCommand(ON)
}
if (Settings_Kitchen_Sensor_Switch_On_Light2.state == ON){
Corridor_Spots.sendCommand(ON)
}
}
}
if (Kitchen_Sensor_Occ.state == OFF){
Kitchen_Sensor_Time = createTimer(now.plusSeconds((Settings_Kitchen_Sensor_Time.state as DecimalType).intValue)) [|
Kitchen_Light.sendCommand(OFF)
Corridor_Spots.sendCommand(OFF)
Kitchen_Sensor_Time = null
]
}
}
end
and this is the Corridor_Spots.sendCommand(ON)
rule "Corridor_Spots"
when
Item Corridor_Spots received command
then
val result15=executeCommandLine("bash /etc/openhab2/cmd/Corridor_Spots.sh " + (receivedCommand), 2000)
Corridor_Spots.postUpdate(result15)
end
Now changed to
rule "Kitchen Sensor"
when
Item Kitchen_Sensor_Occ changed
then
if (Settings_Kitchen_Sensor_Disable.state != ON && Settings_Kitchen_Sensor_Time.state !== null && Settings_Kitchen_Sensor_Lux.state !== null) {
if (Kitchen_Sensor_Time !== null){
Kitchen_Sensor_Time.cancel
Kitchen_Sensor_Time = null
}
if (Kitchen_Sensor_Occ.state == ON) {
if (Kitchen_Sensor_Il.state < Settings_Kitchen_Sensor_Lux.state || Kitchen_Sensor_Il.state === null) {
if (Settings_Kitchen_Sensor_Switch_On_Light1.state == ON){
executeCommandLine("bash /etc/openhab2/cmd/Kitchen_Light.sh ON")
Kitchen_Light.postUpdate(ON)
}
if (Settings_Kitchen_Sensor_Switch_On_Light2.state == ON){
executeCommandLine("bash /etc/openhab2/cmd/Corridor_Spots.sh ON")
Corridor_Spots.postUpdate(ON)
}
}
}
if (Kitchen_Sensor_Occ.state == OFF){
Kitchen_Sensor_Time = createTimer(now.plusSeconds((Settings_Kitchen_Sensor_Time.state as DecimalType).intValue)) [|
executeCommandLine("bash /etc/openhab2/cmd/Kitchen_Light.sh OFF")
Kitchen_Light.postUpdate(OFF)
executeCommandLine("bash /etc/openhab2/cmd/Corridor_Spots.sh OFF")
Corridor_Spots.postUpdate(OFF)
Kitchen_Sensor_Time = null
]
}
}
end
AND TO
rule "Corridor_Spots"
when
Item Corridor_Spots received command
then
createTimer(now, [ |
val result15=executeCommandLine("bash /etc/openhab2/cmd/Corridor_Spots.sh " + (receivedCommand), 2000)
Corridor_Spots.postUpdate(result15)
])
end
Also I removed all the loops I had in my rules and replace them with timers
I think the Timer is a red herring. But what you did do is you move the calls to executeCommandLine from the Rules thread pool to the Timer thread pool. I’d add some logging and perhaps some alerting so you can determine whether executeCommandLine is returning or not. There might be a bug here where the timeout is being ignored and not killing the call if the script takes too long to execute. If that is the case, you will run out of Timer threads eventually too.
HOW DO I CHECK how many timed threads are running ? or if they are failing ??
When I created the scripts I tested them extensively
The script will take commands ON,OFF, 0-100, READ (read returns the real state of the fixture).
All the other commands change the state of the fixture and they give back the real state
the script will run for exactly 1.5 seconds if withing 1.5 seconds there is no reply from the equipment the script will return the last known state of the fixture and exit.
The results of the command (what I assume you mean by “command back”) is just what the script happens to print to stdout. If the script doesn’t print anything, result15 will be the empty String.
There is a whole lot of stuff between your call to executeCommandLine and the script actually being executed. Something could go wrong at any place in that stuff that would cause the call to never return. I couldn’t narrow it down without some sort of stack trace. But honestly, I personally don’t trust executeCommandLine. I’ve seen lots of unexpected behaviors from it on this forum so it ranks pretty high on my list of suspects.
It’s worth a try for sure. We have no idea what is causing the problem so it might work. It also might not. At this point, we don’t really even know if the call to executeCommandLine is in fact a problem. Or if it is a problem, that the root cause of the problem was really something to do with your old setup which has been “fixed” when you moved to the new SD card.
I will give it a try and post the results here. Since I do not know jprofiller, shall i revert to the old setup where I can monitor the process with
shell:threads --list |grep “BLOCKED”
or what if i use shell:threads --list |grep “TIMED” |wc -l
is the above command going to give me all the timed events. If yes then if there is a problem I will be seeing all the timed events increase day by day
I don’t know the name of the timer thread pool. It’s handled by the Quartz library I think but that’s as much as I know.
There is only 10 threads on the timer thread pool (assuming a recent OH 2.5). If you see it grow bigger than that, you are not looking at the right pool. You should have a general feel for how many Threads should be executing at the same time based on your Rules (Astro, Timers, and Time triggered Rules all run out of this pool). If you see more than that you know either what you understand about how your Rules execute is wrong or you have something getting stuck.