MQTT connection to broker was lost -- exactly every 24 hours

I noticed that my mosquitto broker looses connection every day at 06:25… every day.
Went back in the logs, but I only keep the last four days, so I do not know when this started.
I have owntracks in stalled, and ot-recorder n the same rPi… but do not see, whether this at all related.

2017-02-20 06:25:28.468 [ERROR] [o.i.t.m.i.MqttBrokerConnection] - MQTT connection to broker was lost
org.eclipse.paho.client.mqttv3.MqttException: Connection lost
        at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:138) [mqtt-client-0.4.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: java.io.EOFException: null
        at java.io.DataInputStream.readByte(DataInputStream.java:267) ~[na:1.8.0_65]
        at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(MqttInputStream.java:56) ~[na:na]
        at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:100) [mqtt-client-0.4.0.jar:na]
        ... 1 common frames omitted
2017-02-20 06:25:28.476 [ERROR] [o.i.t.m.i.MqttBrokerConnection] - MQTT connection to 'mymosquitto' was lost: Connection lost : ReasonCode 32109 : Cause : null
2017-02-20 06:25:28.478 [INFO ] [o.i.t.m.i.MqttBrokerConnection] - Starting connection helper to periodically try restore connection to broker 'mymosquitto'
2017-02-20 06:25:38.480 [INFO ] [o.i.t.m.i.MqttBrokerConnection] - Starting MQTT broker connection 'mymosquitto'

Any ideas on how to get to the bottom of this?

I am not too worried, but can imagine, since I heavily rely on MQTT messages, that I may miss some messages, while mosquitto reconnects – which seems to take only 12ms…

Should I be worried at all? May not in practice, but wanting to run clean and well behaving systems, I do wonder.

What system is running your broker? Does it have a daily cron job (try sudo crontab -e)? I had a similar issue where my broker was running because the system it was running on did a heavy job every day at 3:30am. I moved the broker to a different box and the broker connection is not failing any more.

1 Like

Or possibly a logrotate is running on the mosquitto broker?

1 Like

I don’t have anything helpful to add, but I thought it was curious that my OH instance seems to restart nearly every morning at 06:25. It appears that it typically follows some errors for the tcp/udp binding about a NullPointerException. I’m not actually using the binding, was just testing, so I removed it and will see if anything changes.

I also get some complaints about Homekit every morning it seems like also.

1 Like

Thanks!

I found:

25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )

in /etc/crontab

I assume setting a different time for this, will shift the disconnect of the broker?!
As such, nothing I can about it?

Thanks, though it is running at 00:00 :wink:

You never know :slight_smile:

I think the two issues you are seeing are unrelated…
I reckon you will still get the broker disconnect at 06:25…

I think you are correct. I think I jumped to the conclusion that something was causing OH to crash and restart because everything appears to get reloaded at that time. It would seem as though we have to be experiencing something related, but I’m not sure why only MQTT would be effected vs the whole system.

2017-02-17 06:25:12.196 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = d2e80654-5667-41ef-9d0c-e6dcb0da360c, base URL = http://localhost:8080)
2017-02-17 06:25:13.568 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'sensors.items'
2017-02-17 06:25:13.687 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'outlets.items'
2017-02-17 06:25:13.708 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'lights.items'
2017-02-17 06:25:13.711 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'CT30.items'
2017-02-17 06:25:14.850 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'thermostat.rules'
2017-02-17 06:25:15.086 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'prowl.rules'
2017-02-17 06:25:15.097 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'manualSwitches.rules'
2017-02-17 06:25:15.111 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'startup.rules'
2017-02-17 06:25:15.688 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'default.sitemap'
2017-02-17 06:25:16.840 [INFO ] [.dashboard.internal.DashboardService] - Started dashboard at /start
2017-02-17 06:25:17.079 [INFO ] [basic.internal.servlet.WebAppServlet] - Started Basic UI at /basicui/app
2017-02-17 06:25:17.182 [INFO ] [arthome.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui
2017-02-17 06:25:17.224 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
2017-02-17 06:25:17.440 [INFO ] [b.core.service.AbstractActiveService] - HTTP Refresh Service has been started
2017-02-17 06:25:17.521 [INFO ] [penhab.io.transport.mqtt.MqttService] - MQTT Service initialization completed.
2017-02-17 06:25:17.524 [INFO ] [t.mqtt.internal.MqttBrokerConnection] - Starting MQTT broker connection 'broker'
2017-02-17 06:25:17.981 [INFO ] [ulation.internal.HueEmulationServlet] - Started Hue Emulation service at /api
2017-02-17 06:25:18.761 [INFO ] [ui.habmin.internal.servlet.HABminApp] - Started HABmin servlet at /habmin
2017-02-17 06:25:19.002 [INFO ] [assic.internal.servlet.WebAppServlet] - Started Classic UI at /classicui/app
2017-02-17 06:53:07.180 [WARN ] [io.openhabcloud.internal.CloudClient] - Jetty request 282031 failed: null
2017-02-17 06:53:16.063 [WARN ] [io.openhabcloud.internal.CloudClient] - Jetty request 282060 failed: null
2017-02-17 09:39:58.878 [ERROR] [lfe.hap.impl.http.impl.BinaryHandler] - Exception in binary handler

Sorry for reviving this thread… but!

It is funny when you research an error and come across your own post. Long ago I used to be a SysAdmin; however, the one thing I took away was “no red error logged in a production system” – as in it was unacceptable and had to be investigated and resolved; up to the point of engaging the vendor to fix it.

So as it turns out, I am a bit wiser now than I was then (in Feb)… and dug deeper. :slight_smile:

cat /etc/crontab

# /etc/crontab: system-wide crontab
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user  command
17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly
25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

… it runs a daily…

ls /etc/cron.daily/
apache2  apt  aptitude  apt-show-versions  bsdmainutils  dpkg  logrotate  man-db  ntp  passwd  samba

it has a logrotate
which points me to:

ls /etc/logrotate.d/
apache2  apt  aptitude  dpkg  mosquitto  rsyslog  samba

Huh, which has a mosquitto file:

cat /etc/logrotate.d/mosquitto
/var/log/mosquitto/mosquitto.log {
        rotate 7
        daily
        compress
        size 100k
        nocreate
        missingok
        postrotate
                /usr/bin/killall -HUP mosquitto
        endscript
}

and here we go: a -HUP command for the broker. This is what causes the disconnect message in openHAB’s openhab.log

MY mosquitto.conf looks like htis:

cat /etc/mosquitto/mosquitto.conf
# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example

pid_file /var/run/mosquitto.pid

persistence true
persistence_location /var/lib/mosquitto/

log_dest file /var/log/mosquitto/mosquitto.log

include_dir /etc/mosquitto/conf.d

… and I changed the log_dest to non:

log_dest none

and then removed:

/etc/logrotate.d/mosquitto

I thought I post in case somebody else needs a solution to this problem.

In theory… this shouldn’t happen…
The HUP signal would only close and re-open the log file (/var/log/mosquitto/mosquitto.log) not restart the service…

Source: https://mosquitto.org/man/mosquitto-conf-5.html (log_dest destinations)

At least… it doesn’t happen in my case when I issue manually killall -HUP mosquitto :slight_smile:

Usually, I see this behavior when a pid file is not set in the config (but you have one set)

I found this happening as well, quite a while ago. It was suggested to me to use the rsyslog daemon to handle logging and this makes life a lot easier.

Instead of log_dest file I use log_dest syslog and then have rsyslog configured to pipe the mosquitto logs to a separate file (/var/log/mosquitto.log). Then I just configure logrotate to rotate that file in the same way all syslog logs are rotated.

Means mosquitto never has to be prodded when rotating logs.

1 Like

Respectfully disagree based on my observation: it should… :slight_smile:
But you’re right, according to the man page, it shouldn’t: “The file will be closed and reopened when the broker receives a HUP signal.”

The HUP signal (historically the “hangup signal”) is usually sent to a program to request that it restarts and re-reads all its configuration in the process. The actual behaviour of the program depends on the program-specific implementation. Not all programs catch HUP and they aren’t required to by convention or dogma. For example, the Apache web server will catch a HUP signal and re-read all its configuration files but it won’t restart any processes.

By the looks of it, mosquitto seems to stop, and create the disconnect message or state for OH.

Thanks, I will look into this…

I understand, just wanted to mention that according to the man page… this shouldn’t happen (?)

Of course, there is a gap between theory and practice :slight_smile:

I found this on github:

Source: mosquitto.c

#ifdef SIGHUP
/* Signal handler for SIGHUP - flag a config reload. */
void handle_sighup(int signal)
{
	flag_reload = true;
}
#endif

Source: loop.c

		if(flag_reload){
			_mosquitto_log_printf(NULL, MOSQ_LOG_INFO, "Reloading config.");
			mqtt3_config_read(db->config, true);
			mosquitto_security_cleanup(db, true);
			mosquitto_security_init(db, true);
			mosquitto_security_apply(db);
			mqtt3_log_init(db->config->log_type, db->config->log_dest);
			flag_reload = false;
		}

According to my understanding, it should not disconnect the clients during sighup signal…

I also found statements of the following kind: “Reloaded on reload signal. Note that currently connected clients will be unaffected by any changes.” Source: mosquitto-conf-5

It seems that some config parameters will trigger a client disconnect upon mosquitto config reload… Not sure which ones :slight_smile:
Maybe authentication? (it would make sense since you could change this in the configs and you would want the clients to re-connect using the new settings). See here: https://lists.launchpad.net/mosquitto-users/msg00208.html

On the other hand… I currently use auth in my mosquitto and I my clients don’t get disconnected when I send a SIGHUP to the daemon… (or when logrotate kicks in for the mosquitto.log)

Relevant fun video: Have the boys found the leak yet? (The Clash - The Leader)

1 Like