Big delay between command and action

Hi,

since a few weeks ago I have a big reoccuring problem with my openHAB installation (1.8.3 and 1.10 addons, latest java from Oracle, macOS 10.12 and 10.13):

Some time after starting openhab I notive big delays (several minutes) between issuing a command, e.g. in the web interface and the resulting action like switching on a light.

What I have notived so far:

  • The delay occurs with devices controlled by several bindings (e.g. KNX and Denon)
  • The CPU usage of openHAB is normal (1.2%), the wqeb interface is responsive
  • Cron rules are not executed at the expected time (test with a rule, that only writes a log message every 5 minutes)
  • When the problem occures and I stop and restart openhab pending commands (e.g. light was switched of in the web interface but the command was not send to KNX) are sometimes executed when openHAB is stopped or started again.
  • Values received from KNX are not processed correctly or timely

What I have tested:

  • Removed all bindings added in the last few month
  • Disabled all rules added in the last few month
  • Restarting my wiregate, which is used for KNX communication
  • Sending KNX commands via groupswrite on the wiregate when then problem occurs, they are proccessed correctly

Currectly I am running out of ideas, especially after openHAB worked without major problems for several years and I have not changed the setup in a big way.

Any help or ideas what else to test would help me a lot!

Thanks in advance,
Juelicher

Unfortunately, I have no suggestions. However, since you have already proven that the problem is not isolated to a specific binding that means it is a problem with the OH core or with something external to OH.

If it is a problem with OH core, even if the problem does get identified it will never get fixed. 1.8 is end of life and no further development is done on it.

If it is a problem external to OH it is going to be pretty hard to isolate. You are already doing everything I can think of.

Since the problem happened relatively suddenly I’m going to guess it is caused by something external to OH. What has been update, upgraded, installed, or removed recently?

I wonder about something being deprecated in Java – used in the OH1.x versions with older Java(s), but deprecated to the point that “latest java from Oracle, macOS 10.12 and 10.13” is beyond deprecated to “no longer there…period”. Just my “spider-sense” :slight_smile:

Thanks for your ideas!

macOS 10.12 should be OK, I am using it since it was releasec, but one of the latest updates might have broken anything. 10.13 shows exactly the same problems.

This is really strange! A few weeks before my vacation I stoped making changes to openHAB (and changing settings) to not break anything while I was abroad. After my vacation the problems started. This indicates, that the problems are triggered by the settings I changed back after my vacation.

Fortunatly I have all changes to my configuration within a git repository. To be safe and to make sure, that I did not miss problems during summer I am currently testing the configuration as it has been in march, everything worked as expected at this time.

Next step would be downgrading java 8 to build 131, the oldest version I have available.

Switching to openHAB 2 is on my schedule. I first tried to switch at the beginning of the year, but could not because of a bug in the KNX binding. Maybe I have to hurry a bit, but would much prefer to switch during winter, when I have more time available…

I understand your “switching reluctance”. I had the same thing but decided to jump in a few months back.

My two cents (US slang for “my opinion”, more or less) is that there are a few things that are clearly worth the “switching cost”.

  1. The updated EXPIRE binding – obviates the need for MOST timer-style housekeeping and lets you centralize those condensed “timers” via EXPIRE centrally as an ITEM so they span the scope of multiple-rules files. Huge saving in 'book-keeping" in your rules if you now use timers.

  2. The stricter load-time checking on .items and .rules files – at first (and for a while, truthfully) – it is a pain–(somewhere), but it is worth the pain for the value of testing the edges of your “mental model” of your system.

  3. The better (but possibly over-segmented) logging structure – ESSENTIAL when you are doing any sort of complex automation

  4. The JSR-233 scripting support. Personally, I’ve gotten a LOT better at some of the Xtend syntax and conceptual edges over the last 2 months, but sometimes it just falls short. The JSR-233 (aka Jython, but others too) surfaces interfaces which are nearly essential when you are doing any sort of complex orchestration of your automation targets. Two-way orchestration is reasonably simple in in Xtend, but anything beyond 2-way becomes a rat’s nest.

There is a 1.10 version of Expire for use in OH 1.8. While I’m glad you switched I hope this wasn’t the primary driver. :slight_smile:

I’d like to hear more about your observations on this because I saw no appreciable difference between logging on 1.8 verse 2.x in this regard. Based on my experience OH’s approach to logging is pretty standard for this type of application. What I can say though is that the application of what constitutes an INFO verses a DEBUG level log statement is not terribly consistent across all the bindings.

Also available in 1.8.

It’s a great list but except for 2 not really specific to OH 2.

To me the main reasons to upgrade are:

  1. 1.8 is end of life in terms of development. Neither bug fixes nor new features will be added to it which means eventually it will most likely die at some point with a fatal bug and then I’d have to scramble to upgrade.

  2. The core of OH 2 is as stable as 1.8. But as with 1.8, the stability/maturity of add-ons varies but the most commonly used add-ons seem pretty solid (Astro being the exception which had some serious issues in 2.1 release).

  3. There are a number of new technologies being released all the time and there are and probably only will be 2.x version bindings for them.

There are lots of new features in OH 2 that are being added which may not be as attractive to OH 1.8 migrators but which are serving to greatly reduce the initial complexity of learning OH which has already brought in large numbers of new and somewhat less technical capable users who are still managing to be successful.

And with the advances to OH’s new Experimental Rules Engine, I believe JSR223 (or whatever it will be renamed to when OH migrates to Java 1.9 and that addon becomes part of the core JRE) is a first-class citizen when it comes to Rules development. When the Experimental Rules Engine stops being Experimental, I suspect that the primary choices for new users (I don’t see the existing Rules DSL going away anytime soon) will be the Graphical Scratch based browser rules programming or one of the JSR223 languages with legacy support for the existing Rules DSL.

ACK all your reasons.

I did not know about EXPIRE until after I started the switch to 2.x.

Until I started using EXPIRE (along with virtual aka PROXY items) for common timers and items across segmented items/rules files the asymmetry of the namespaces drove me crazy. EXPIRE and virtual/PROXY items let me really split items and rules consistently into files that were logically-inspectable (ie a page or two of code in VS-Code and hence did not require AWFUL QA regression testing with every iteration).

Between the two, I’ve been able to split a single “Empire State Building”-sized items file into 50 or so files of logically-related items and condense multiple rules files into a much smaller number of rules “meta-” files. A large part of the latter was using groups, but as we’ve discussed, groups have their limitations. I’ve gone through something like 8 major versions of re-architecting in the migration (along with addressing some scaling issues in my own world, so “8” is definitely overloaded.

HOWEVER, I will agree that the single biggest reason to “bite-the-bullet” is 1.x end-of-life.

1 Like

Just a little update…

I tried to successively disable parts ofd my installation to localize the problem. It disappeared after deleting the mapdb storage files, so I assume, that it is caused by one of the settings that are persisted and restored on system startup. Very strange, as no change I made within the last year is left und openHAB was running rock stable before. I updated Java to 8.144 weeks ago, but openHAB was stable after that.

@bob_dickenson
Thanks for advocating openHAB 2 and pointing out the advantages. Especially JSJSR223 scripting seems, as if a lot of my prayers have bee heard! Somehow I missed it in the past.

I will surly quicken the migration and give it an other try within the next few weeks. The last time I tried a bug in the KNX binding, occiring in conjunction with openHAB 2 was a show stopper for me, but maybe this has been fixed, I am not sure but will check it out. But to be honest the main advantage with openHAB 2 at this time would be a improved startup behavior without the regular “file not found errors”.

Nonetheless I would feel better if I could find the cause for my problems…

And another update:

I switched to openhab 2.1 and the beginning of November , a few weeks later to the 2.2 snapshot and last week to the 2.2. stable. Between November everything was fine, but today I noticed the same problem again.

As I think, that it is related to the KNX binding I will start a new thread with a more fitting title!