Mmh, I sometimes do experience this, too, but of course it’s hard to say if the root cause is the same.
First thing to check are config or rule changes that result in the need to internally recompile at times when the code is used for the first time (which can be quite some time after the actual change). I’m seeing CPU spikes at times, but I didn’t investigate further. Either way, try to not change anything for an hour and see if your problem (accessing the same items/rules) can be reproduced an hour later.
For me, that helped a lot. Very few to no visible occurences as long as I don’t change anything. I guess that’s the origin of the saying ‘Never touch a running system’ .
But from my experience, when things get slow, it’s rarely OH main software or HW but a communication channel that enforces serialization and sometimes message queueing adds up to noticeable delays.
I have ~100 devices, several groups and rules, and run it on a RPi2. It’s far less powerful than your HW, still it’s idling most of the time (except when restarting OH).
Now I didn’t cross-check with the code, but the usual way of programming is to log messages AFTER a successful operation. So it doesn’t mean that in the 2-3 seconds of apparent inactivity there’s NO OH communication to the device. Could mean as well there IS, but it ain’t successful. Or could mean the comms channel is blocked so OH cannot send.
Radio delay and timers add up. If you do the math, to send a series of commands (as you do if you address groups of items) WILL take quite some time.
Remember radio comms is a SINGLE, completely serialized channel with a potentially large queue. You can have many rules at the same time waiting for the radio commands to finish that they issued.
Think of it as a sports event: many people rush to the stadium in fast cars, but then everybody has to lineup at a single entrance for access control. A faster car won’t help.
There’s a lot happening at the radio layer that you usually don’t notice: status (reachability) messages, radio interference resulting in retransmits. You don’t get OH debug messages for most of these things.
Plus, there’s a regulatory rule that each device using the 868MHz frequency (EU) may occupy at most 1% of the bandwidth, and as far as I know, device manufacturers properly implemented it. That also applies to the controller(s), effectively introducing delay, especially at peak times. I’ve seen that happen.
Plus, some effects are hard to find. You can’t properly see them in logs, such as this one:
I recently noticed that after sending a couple of zwave commands, I kept getting log messages about my maxcube (thermostat management server) not responding properly.
maxcube communicates with thermostats using a proprietary radio protocol using the same radio frequency as zwave does. The zwave binding ensures that zwave messages don’t collide, and probably so does maxcube, but if you think about it, it becomes obvious that at peak times, zwave and maxcube messages interfere because the two subsystems don’t know of each other. Once a zwave and a maxcube message collide, both subsystems will resort to a series of retransmits, still independent of each other, in effect greatly reducing the effectiveness of the radio band.
What to do? Well, I’m struggling somewhat, too.
I’ve checked radio timers and retransmit options (to the extent possible).
I’ve rearranged the order of commands to obtain zwave and maxcube ‘blocks’ of messages and put a Thread::sleep() inbetween.
See if that might apply to your setup as well.
Markus