Syslog Errors (100 GB)

Wolfgang1966 · March 21, 2021, 10:26am

The error in the log mainly looks to me as if influxdb is not running at that moment or using a different port than 8086:

...
Caused by: java.net.ConnectException: Failed to connect to /127.0.0.1:8086
...
Caused by: java.net.ConnectException: Connection refused (Connection refused)
...

I don’t think this is related to the syslog errors problem.

At least it looks like it doesn’t happen that often when using plain openjdk instead of Zulu Java, so it might be related to some strange behaviour of the JRE itself. openHAB 3 runs for one week now for me without showing the problem again.

rossko57 · March 21, 2021, 10:36am

Can you confirm when your syslog starts going mad, does the timestamp tally with the openhab.log?

manswiss · March 29, 2021, 10:06am

And it happens again yesterday.
Openhab-Logs:

openhab.log begins with first entry after restart
openhab.log.5 ends some days before (Mrz 26, 17:49)
openhab.log.6 is missing (not exists)
openhab.log.7 is arround 5MB big and begings with:

2021-03-28 18:00:16.802 [WARN ] [.incoming.MqttIncomingPublishService] - QoS 0 publish message dropped.
2021-03-28 18:00:16.804 [WARN ] [.incoming.MqttIncomingPublishService] - QoS 0 publish message dropped.
2021-03-28 18:00:16.804 [WARN ] [.incoming.MqttIncomingPublishService] - QoS 0 publish message dropped.
2021-03-28 18:00:16.805 [WARN ] [.incoming.MqttIncomingPublishService] - QoS 0 publish message dropped.
2021-03-28 18:00:16.805 [WARN ] [.incoming.MqttIncomingPublishService] - QoS 0 publish message dropped.
2021-03-28 18:00:16.806 [WARN ] [.incoming.MqttIncomingPublishService] - QoS 0 publish message dropped.

and ends with:

2021-03-28 18:00:56.981 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:6afa898ba3:egentreedoorbell tried updating channel doorbell#ring although the handler was already disposed.
2021-03-28 18:00:56.981 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:6afa898ba3:egentreedoorbell tried updating channel doorbell#ring although the handler was already disposed.
2021-03-28 18:00:56.981 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:6afa898ba3:egentreedoorbell tried updating channel doorbell#ring although the handler was already disposed.
2021-03-28 18:00:56.981 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:6afa898ba3:egentreedoorbell tried updating channel doorbell#ring although the handler was already disposed.
2021-03-28 18:00:56.982 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler HomieThingHandler of thing mqtt:homie300:6afa898ba3:egentreedoorbell tried updating channel doorbell#ring although the handler was already disposed.

Between are all the same kind of messages (many of them).

In the syslog is also nothing between Mar 26, 17:50:49 and MAR 28, 18:01.01. Also the same gap as on openhab logs. And also the same gap on events.log from openhab.

I think (not 100% sure) but my restart (after dedecting that openhab items not response) was around MAR, 28, 18:00.

On /opt/zram/zram2/upper/openhab I see the missing openhab.log.6 with 0 bytes:
zram-folder

What can I do as next? I have no clue where I can search as next to solve the problem.
Thank you very mutch for your help.

manswiss · April 1, 2021, 3:02am

And it happens again. Now I was able to take a look into syslog before restart. Same java exceptions.
Please help me.
I don’t know what I can do now.

manswiss · April 7, 2021, 4:50am

It happens again and again.
I tryed the following setting with no luck:

org.eclipse.smarthome.threadpool:ruleEngine=100

Wolfgang_S · April 7, 2021, 7:11am

The openhab.log.6 entry in one of your previous posts is not a normal file;
it seems to be a character device …; besides that it is not owned by openhab:openhab
Do the repeated occurences of your problem also show that one of the files seem to be a character device ?

rossko57 · April 7, 2021, 9:36am

What happens when the log tries to roll over, and shuffle existing historic logs down one, I wonder?

manswiss · April 8, 2021, 8:42am

I think the problem of missing openhab.log.6 is, that the /var/log is full (because of syslog errors) and then i reboot the system. My guess is, that zram is then not able to write down the changes.

manswiss · April 8, 2021, 8:43am

I don’t know.

kempa007 · May 7, 2021, 9:58am

Hi,

I expirienced the same problem twice. Openhab 3 was running without problems and after system update (Raspberry Pi 4B) I got foold of the same logs. I upgraded java, but I see that people report it is not helping. Have you found out what was causing the problem? How to fix it? At the moment I use only two bindings- modbus and vallox.

manswiss · May 7, 2021, 6:20pm

No solution

ssalonen · September 1, 2021, 4:46pm

For what it’s worth, I am experiencing the same issue here. I get log entries similar to below

karaf[22367]: Exception in thread "pool-6-thread-52852303"         at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:214)
karaf[22367]:         at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
karaf[22367]:         at java.base/java.lang.Thread.run(Thread.java:829)
karaf[22367]: java.lang.NullPointerException
karaf[22367]:         at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:214)
karaf[22367]:         at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
karaf[22367]:         at java.base/java.lang.Thread.run(Thread.java:829)
karaf[22367]: Exception in thread "pool-6-thread-52852304" java.lang.NullPointerException
karaf[22367]:         at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:214)
karaf[22367]:         at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
karaf[22367]:         at java.base/java.lang.Thread.run(Thread.java:829)
karaf[22367]: Exception in thread "pool-6-thread-52852305" java.lang.NullPointerException
karaf[22367]:         at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:214)
karaf[22367]:         at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
karaf[22367]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
karaf[22367]:         at java.base/java.lang.Thread.run(Thread.java:829)

The logging is very very rapid

This time, further debugging with live java heap dump shows that threads named pool-6-thread-XXX are associated with worked threads of SseResource executor. However, I have withnessed same error with other threads as well so it is just a coincidence, I think.

The error location seems to indicate something wrong with Java VM itself (relevant piece of code in openjdk source code)?

Oddly enough, google search with "LinkedBlockingQueue.java:214" only finds these openhab community posts.

My system specs:

openjdk version "11.0.11" 2021-04-20 LTS
OpenJDK Runtime Environment Zulu11.48+21-CA (build 11.0.11+9-LTS)
OpenJDK Client VM Zulu11.48+21-CA (build 11.0.11+9-LTS, mixed mode)

Raspi 4 running rasbian:

Linux lerbacka-raspi 5.10.17-v7l+ #1421 SMP Thu May 27 14:00:13 BST 2021 armv7l GNU/Linux

I will probably try to update zulu java, and possible try the workaround above Syslog Errors (100 GB) - #15 by Wolfgang1966

UPDATE: Now running Zulu 2021-07-20 java 11

openjdk version "11.0.12" 2021-07-20 LTS
OpenJDK Runtime Environment Zulu11.50+19-CA (build 11.0.12+7-LTS)
OpenJDK Client VM Zulu11.50+19-CA (build 11.0.12+7-LTS, mixed mode)

Also cleared the openHAB cache (stop openhab, sudo openhab-cli clean-cache, start openhab).

Will report how this goes. Based on history, problem seems to resurface in ~2 weeks

mstormi · September 1, 2021, 5:27pm

For those of you that run openHABian, try switching Java to AdoptOpenJDK to see if that makes a difference. It’s a simple menu option. Anyone else can also install it manually, of course.

PS: have you cleaned the cache

ssalonen · September 5, 2021, 7:14am

I got the same issue with Zulu 2021-07-20.

Now trying again with AdoptOpenJDK, clearing the cache first.

Will report how it goes.

ssalonen · September 23, 2021, 1:21pm

It’s been close to 3 weeks running with AdoptOpenJDK. So far it looks to be better than zulu… I used to got the errors discussed above every ~2 weeks. I will keep monitoring and report again

norbert_jordan · October 20, 2021, 4:04pm

@ssalonen any update? still working without the issue we are having?
Cheers, Thanks Norbert

ssalonen · October 20, 2021, 6:19pm

Unfortunately the installation has suffered (unrelated) power cuts, and therefore the uptime has been at most 2 weeks. I have not seen any negative issues so far.

I will keep running and report here after some weeks.

ssalonen · December 4, 2021, 3:34pm

Everything seems to be stable with AdoptOpenJDK, installation has been up 5 weeks now without any issues.

Oliver2 · December 4, 2021, 6:57pm

The fact that there is a menu option for AdoptOpenJDK, does that mean it is recommended to use AdoptOpenJDK?

mstormi · December 5, 2021, 11:43am

Of course not. Any option means that you can use it, nothing more.
How do you come to assume anything beyond that ?