Script to restart OpenHAB when bridge offline?

Dear all,

I use OpenHAB for my WAVIN floor heating system through a USB adaptor, it works great except that sometimes when OpenHAB is restarted, it complains the USB device is not available (see example message below):

**COMMUNICATION_ERROR**
Could not find a gateway on given path '/dev/ttyUSB0', 0 ports available.

And the gateway is marked “Offline” and never come back online after another 1-2 restarts. I run OH3 as Docker container and I’m not sure how to solve this problem properly. So I’d like to make a workaround, maybe a shell script that keeps restart OH3 (docker restart command) until the bridge is online. So something like:

  1. Restart OH3
  2. Wait for 10min
  3. Check if the gateway is online
  4. If no go back to 1)

My knowledge of shell script and OH isn’t good enough, so need some help here:
How can I check if a certain thing in OH3 is online or not?
And is there any better way of solving this problem?

Thank you very much in advance!

Why is openHAB restarted at all ?
Mine is only restarted in case of an update. In general it is running 24/7 without the need for a restart.

Testing, OS upgrade, upgrade etc.
Not very often but a couple of times every month

Should not be necessary. I do tests on a seperate machine and apply OS updates only if critical or when updating openHAB.

I would first try to analyze the root cause why the device is not comming up e.g. it gets a different name assigned.
There are several users that do a scripted restart of their instance or a dedicated bundle via scripts. Examples should show up by searching ( click link ) on the forum.
With regard to your approach of doing reboots until the problem fixes ‘itself’ this may end up in endless restarts.

2 Likes

Hi @Wolfgang_S : thanks for your reply!
Yeah I think it would be great to find out the root cause, however I’m lacking needed expertise to troubleshoot the issue. I posted my issue a few weeks ago but didn’t receive any reply, hence I came up with the workaround in this post.

I have checked obvious things like naming and permission, and also searched around but couldn’t find similar problem (most posts are about initial setup with wrong argument and permission). In my case restarts cause problem randomly.

Hmmm. Go through the steps to make this error happen and then look to see if /dev/ttyUSB1 appears. I’m wondering if the binding is failing to release the lock on the device when OH restarts.

The fact that Docker is involved complicates matters some. How are you restarting OH? Just restarting the process or restarting the whole container? If not restarting the whole container I can easily see how the lock wouldn’t be removed because Docker still has it but OH is now a new process asking for access and things just get confused.

Because you are running in Docker, you’ll need to somehow run this script outside the container since it’s the whole container you’ll want to restart. That means you can’t run it from OH itself. Given this complication, I’d recommend spending time figuring out what’s happening with the device and fixing that instead of trying to work around it by restarting OH.

You’d have to set up a service that periodically queries OH (maybe checking the Thing’s status through OH’s REST API) and restart the container if necessary.

1 Like

When the problem occur, the device /dev/ttyUSB0 exists (both accessible on host and in docker container). The permission is set to 777 when the host starts. I also think the device is used by some ghost process not cleaned up when the container is restarted, however lsof command always return empty list (docker complicate things as you suggested) regardless if it’s working or not.

I restart the whole container with docker restart command. If it’s lock problem I expect to see something in log, but here I only see IOException. Is there anyway to check the lock of the device claimed by some ghost process?

Thank you in advance!

Edit: since lsof doesn’t work I found another command which seems to work. So now I can see OH is using the device:

[~] # ps ax |grep tty
15535 9001     594576 S   /usr/lib/jvm/default-jvm/bin/java -XX:-UsePerfData -Dopenhab.home=/openhab -Dopenhab.conf=/openhab/conf -Dopenhab.runtime=/openhab/runtime -Dopenhab.userdata=/openhab/userdata -Dopenhab.logdir=/openhab/userdata/logs -Dfelix.cm.dir=/openhab/userdata/config -Djava.library.path=/openhab/userdata/tmp/lib -Djetty.host=0.0.0.0 -Djetty.http.compliance=RFC2616 -Dnashorn.args=--no-deprecation-warning -Dorg.apache.cxf.osgi.http.transport.disable=true -Dorg.ops4j.pax.web.listening.addresses=0.0.0.0 -Dorg.osgi.service.http.port=8081 -Dorg.osgi.service.http.port.secure=8082 -Djava.awt.headless=true -Dfile.encoding=UTF-8 -XX:+UseG1GC -Dgnu.io.rxtx.SerialPorts=/dev/ttyUSB0 -Duser.timezone=Europe/Copenhagen --add-reads=java.xml=java.logging --add-exports=java.base/org.apache.karaf.specs.locator=java.xml,ALL-UNNAMED --patch-module java.base=/openhab/runtime/lib/endorsed/org.apache.karaf.specs.locator-4.3.4.jar --patch-module java.xml=/openhab/runtime/lib/endorsed/org.apache.karaf.specs.java.xml-4.3.4.jar --add-opens java.base/java.security=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.naming/javax.naming.spi=ALL-UNNAMED --add-opens java.rmi/sun.rmi.transport.tcp=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.time=ALL-UNNAMED --add-opens java.desktop/java.awt.font=ALL-UNNAMED --add-exports=java.base/sun.net.www.protocol.file=ALL-UNNAMED --add-exports=java.base/sun.net.www.protocol.ftp=ALL-UNNAMED --add-exports=java.base/sun.net.www.protocol.http=ALL-UNNAMED --add-exports=java.base/sun.net.www.protocol.https=ALL-UNNAMED --add-exports=java.base/sun.net.www.protocol.jar=ALL-UNNAMED --add-exports=java.base/sun.net.www.content.text=ALL-UNNAMED --add-exports=jdk.xml.dom/org.w3c.dom.html=ALL-UNNAMED --add-exports=jdk.naming.rmi/com.sun.jndi.url.rmi=ALL-UNNAMED --add-export

I know ttyUSB0 exists, but is there now a ttyUSB1? That’d indicate that the device got disconnected and reconnected and now appears as a new device.

No, if the lock exists OH would be prevented from reading/writing which will show up as an IOException. Only the Os knows about and enforces the lock. OH wouldn’t know why it can’t read/write, only that it can’t.

The fact you are on QNAP even further complicates matters because they do all sorts of weird stuff with hardware. We used to officially support QNAP but eventually had to drop it because it was just too hard to make OH work. I wouldn’t be surprised if that isn’t coming into play here too.

Regardless, OH has no access to the Docker it’s running on top of (would be pretty poor isolation if it did) so you won’t be able to restart the container from inside the container (i.e. OH cannot restart it’s own container).

There is still a problem with openhab leaving a legacy lockfile that only seems to be removed by running as root. There is a thread on here about it.
I have same problem with ZWave on ACMtty
Usually running as rootusually does the trick- I then stop and restart as usual