openHAB 4.1 Milestone discussion

Seems a bit weird to use the browser language. It’s not consistent behaviour with web sites where you have an option to select the language. I have specified english language in regional settings but the UI is partially in portuguese.

I am seeing the error below on the most recent 4.1 builds. This is happening with the ScriptExecution.CreateTimer function. I was not having the error on 4.0. I need some guidance on how to provide better debug information.

OutOfMemoryError: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

@mjcumming Plesse check this thread

2 Likes

I believe further investigation into the root cause of the http client thread/memory leaks needs to be performed as we move through 4.1. There seems to be an issue across serveral bindings which only appeared in the 4.1 milestone/snapshot. It was not present in 4.0 from what I can tell. Theres also a pool slowly leaking in the core as well (from what I’ve been able to see). There are several threads now about OOM errors and instability which all seem to tie back to the http clients.

There were not a lot of changes in 4.1 core framework. Maybe something related to upgrade of Jetty HTTP client version?
This change occurred after milestone 1, only in recent 4.1 snapshots

I totally agree with you, I already brought up the question here: Language setting only partially respected

And I got the same answer, but it’s really not an intuitive feature

Completely agreed. It’s a bit confusing that it happened and I haven’t been able to pin down the culprit either. The neeo binding was very stable for years and has had 0 development/changes. No reason it should have just broken. That said, we have several bindings that haven’t been reported broken (yet) which use the same code. My concern is that we will roll 4.1, upgrades will happen, and the reports will come flooding in.

EDIT: missed the comment about the update. M1 is definitely impacted by this, there are reports from folks on it. This has to be a change from 4.0 release to M1.

I checked quickly the list of changes in 4.1 M1 release notes.
I hope it is not this PR which was introduced due to s high CPU usage:

That’s also included in 4.0.2.

Good remark. So this is something else.

Would it be prudent to create an issue on github to track? Not sure if it should live in addons or core.

EDIT: been looking at some of the bindings that seems to have the issue. They seem to all use javax.ws.rs.client.ClientBuilder. I converted neeo to use org.eclipse.jetty.client.HttpClient which seems to be stable. Perhaps the issue is with ClientBuilder? A quick grep shows it in use in many bindings.

Was javax.ws.rs.client.ClientBuilder version changed between 4.0.3 and 4.1 M1?
I am sure @wborn could inform us.

I think it’s provided by CXF which was updated to 3.6.1. There seems to be a 3.6.2 too now with a fix that might help:

https://issues.apache.org/jira/browse/CXF-8885

2 Likes

There is now a snapshot build (3659) using CXF 3.6.2 (openhab-core#3826).
Perhaps you can give it a try and see if it helps with fixing the memory issues?

1 Like

That looks very similar to what we’re experiencing. Looks like I have several things to upgrade and test when I get home.

Part 2, does it make sense to convert to the jetty client even once this is fixed?

We may have a winner with 3659 on multiple fronts.

First, stock neeo seems stable after 12 hrs. I’ll keep watching, mine took 2-3 days to blow. @Mherwege you may want to give the new snapshot a spin too to see if it brings stability.

Second, I’ve enabled the folder logging for the thing loads and I for the life of me can’t get the failure to happen now.

Nevermind, still broken. It stayed stable for the night because no one was using the remote. The moment someone used it threads spiked and the whole system went sideways. I have 534 open threads that look like:

“HttpClient-23307-SelectorManager” Id=586871 in RUNNABLE (running in native)

And I’m again getting this error:

2023-10-05 11:33:58.179 [WARN ] [ab.core.internal.events.EventHandler] - The queue for a subscriber of type ‘class org.openhab.core.internal.items.ItemUpdater’ exceeds 5000 elements. System may be unstable.

It looks like the core had a Jetty “upgrade” between M1 and M2 which has broken the Hue API v2 binding …

A jetty upgrade came in M2 with the upgrade of Karaf to 4.4.4.
@wborn

Sorry I think some ALPN bundles went missing during the update. But fear not! I think we can get them back by also installing the pax-web-jetty-http2-jdk9 feature. I can create a PR and maybe you can help testing it @AndrewFG ? Seems like nobody used Hue on a recent snapshot build (or did not report this issue).