All,
I would like to apologise personally to all of you for the annoying downtime that we experienced across our services today. This occurred on the worst possible day of the year, immediately following the release of openHAB 5.0, at a time when people needed to access our servers.
As part of a post-mortem analysis, I would like to share some background information on what happened:
Some time ago, our long-standing domain hoster for openhab.org decided that they would migrate all email infrastructure for their customers to Microsoft 365. They promised that this would be seamless and that existing configurations would be maintained. Last Friday (18 July) was the migration day for openhab.org, and unfortunately all email configurations were lost, especially the forwarding rules. This meant that I was no longer available at my primary developer address (kai@openhab.org), and worse still, the forum was unable to receive email posts. The support tickets that I opened on Saturday (19 July) have not yet been answered. The hotline was not reachable at all. It seemed that we werenât the only customer with issues, and that their support team was completely overwhelmed.
As the M365 email infrastructure is not what we are looking for (we prefer simple IMAP/SMTP infrastructure and plain email forwarders), and it turned out to be incompatible with our forum, I decided to resolve the issues by switching to a different domain hoster: Strato, a major player in Germany where I expected professional processes and a smooth transfer.
I therefore started the domain transfer on Monday (21 July), which was still before the openHAB 5.0 release. Based on my experience of past transfers, you typically shouldnât experience any downtime if the DNS records are set up correctly at the new hoster.
Unfortunately, the AuthCode that I received from the previous hoster was invalid for some reason, so Strato couldnât start the transfer. It only worked this morning (22 July) when a new AuthCode was issued.
So far, so good â the domain transfer has started. All I had to do was set the NS records for the DNS servers (which we operate at Cloudflare). However, it then transpired that the Strato customer centre wouldnât allow me to configure the nameservers until the transfer was complete. This meant that they were pushing their own nameservers with an A record that pointed to one of their servers, displaying the page that some of you may have seen.
I spent an hour on their hotline trying to convince them to change the NS entries for openhab.org. They simply wouldnât do it â or rather, they didnât understand the issue and refused to put me through to anyone who did. The only option left was to wait until the configuration was finally possible later that afternoon. After setting the correct NS records immediately, it then took another one to two hours for them to be picked up by the major DNS servers, after which our servers became available again.
TL;DR: Sorry for the inconvenience! It was a combination of bad luck and extreme time pressure, and I messed things up by being too optimistic that everything would work as it should. I should be old enough to know that this is usually not the case. I promise to do better next time!