Plugwise binding just stops

bryeng · September 28, 2016, 8:45am

Hey!

I am having some problems with the plugwise to stop after some random time. I have tried to touch the plugwise binding, but it fails again.
The error message i get is: “Failed to schedule quartz-job”.
My suspicion is that the quartz thread-queue is full, and therefore fails. Are there any method of reseting quartz while running?
If not, is there any possibility to have a rule that restarts the whole program if it stops?

Thank you!

frederic · October 5, 2016, 12:45pm

Hi bryeng,

I have a similar problem, after a while (random between 5 to 15mn), the plugwise binding stops.
In my logs I have something like this
:34:34.936 [ERROR] [inding.plugwise.internal.Stick:445 ] - Plugwise protocol message error: 00001A3B00C1B88A
13:34:34.950 [DEBUG] [inding.plugwise.internal.Stick:441 ] - Plugwise protocol header error: 000D in message 000D6F0001A4078D

any kind of protocol error.

When I restart it from the osgi console, I have the "Failed to schedule quartz-job " error message, but things work back again.

16-10-05 14:37:30.307 [ERROR] [inding.plugwise.internal.Stick] - Plugwise protocol message error: ???003F25C2000D6F0001A4078D0C251D0201457A96AA
2016-10-05 14:37:30.322 [ERROR] [inding.plugwise.internal.Stick] - Plugwise protocol message error:
???000025C500C10A96
2016-10-05 14:37:30.397 [INFO ] [runtime.busevents ] - Lamp_Switch state updated to ON
2016-10-05 14:37:30.398 [INFO ] [runtime.busevents ] - Nevara_Switch state updated to ON
2016-10-05 14:38:15.977 [INFO ] [.service.AbstractActiveService] - Plugwise Refresh Service has been shut down
2016-10-05 14:38:33.687 [ERROR] [b.plugwise.internal.CirclePlus] - Error scheduling Circle+ setClock Quartz Job
org.quartz.ObjectAlreadyExistsException: Unable to store Job : ‘Plugwise.000D6F0001A4078D-SetCirclePlusClock’, because one already exists with this identification.

I’ll try to monitor logs and relaunch the osgi module from an external cron script for the moment.

I used the patched binding for 1.9.0. together with the serial for 1.8.3.

Hope that helps.
I think thoses protocol errors should not stop the module from working. don’t know if it can be done

bryeng · October 6, 2016, 1:07pm

That is exactly the same error as me. I run openHAB from the cmd, so, how can you get the osgi console then?

frederic · October 6, 2016, 3:42pm

simply from the console where you have the logs of the openhab server.
hit return and then ss for example.
you will seel all the osgi modules

bryeng · October 7, 2016, 1:06pm

Does that work for you?
If I restart the osgi module, the switching starts working again, but I still have the same problems getting information from the plugwise devices..

wborn · October 7, 2016, 4:47pm

@bryeng is the missing information you refer to the current power? I’ve fixed that with #4669 less than a week ago. You can download an updated version of the binding with this fix from the openHAB build server.

I also have the Error scheduling Circle+ setClock Quartz Job exception at startup. But it is thrown only once at startup and does not cause any major failures for me. It would be better when it is fixed though.

bryeng · October 11, 2016, 10:43am

Thank you for the response.
That is correct. I tried your binding, but I still get the same error:
2016-10-11 10:00:41.733 [ERROR] [inding.plugwise.internal.Stick] - Plugwise protocol message error: 0000AA2000C11423
ÿ

Do anyone have any idea of what the error might be? Or why it happens?

wborn · October 11, 2016, 6:06pm

I’ve seen this error only once so far. When it occurred I restarted openHAB to fix it. Here are some ideas on what may be causing it:

Outdated firmware in Circles; you can check the firmware version in Source, it should be year 2011 or newer.
USB power issues, a common issue when using a cheap power adapter with Raspberry Pi’s; test a better USB power adapter or another PC.
Some OS USB driver issue, on Linux something may be logged about it in /var/log/syslog or /var/log/kern.log. Try another/fixed OS/PC.
Faulty USB port or chip; test with another port/PC, or see if another device does work on the same port.
Some other program that changes the serial port settings. When it occurs you could check if the port settings are still correct. They correct ones are initialized in org.openhab.binding.plugwise.internal.Stick.initialize()
```
          serialPort.setSerialPortParams(115200, SerialPort.DATABITS_8, serialPort.STOPBITS_1,
                  SerialPort.PARITY_NONE);
```
Something in the Stick itself that causes it. Perhaps the Source can recover from it be re-initiating the Stick. I’ve also seen the Source temporarily loose its connection with the Stick. But only occasionally and not every 15 minutes. Also the Source software may be a bit smarter to recover from corrupted messages.

Because I rarely have this issue, I think in my case the last scenario occurred. In your case probably one of the others.

Max1968 · October 11, 2016, 6:36pm

Hi Brynjar,
hi Frederic,

how many plugwise devices are you using ? Back when I was using plugwise it showed very unstable behaviour (there should be some posts from me here), and i only was able to improve it by rebooting my openhab server each night. I had about 35 plugs etc, and it might indeed be the number of devices or the mesh of my zigbee network that caused these issues. Meanwhile I replaced all my plugwise devices with zwave ones and use a zipabox for these zwave devices, polling the status from openhab sometimes, and i’m not facing these issues anymore. Its not the configuration I originally intended, but as openhab does not yet support zwave security stuff for me it was the best compromise i could get for now.

wborn · October 11, 2016, 8:14pm

@bryeng can you tell us a bit about your setup? Is it Linux/Windows? Using a mini PC like a Pi? Does it all work well with Source?

I’ve just created PR #4700 that should fix the Error scheduling Circle+ setClock Quartz Job error. Though that will probably not fix anything for you because it was a caught and logged exception anyways. More importantly this PR improves switching Circles by making it faster and more reliable.

bryeng · October 12, 2016, 9:59am

Hi everyone, and thank you for responding.

I run openhab on a computer with windows 7 and everything works fine with Source.
I have four circles nearby the computer, so that should not be an issue.
Currently the interval is 200 ms on the zigbee network.

I have tried some different methods to restart the plugwise-binding.
First I changed the modification-date of the .jar-file → forcing plugwise to reload the binding. That resulted in “Failed to schedule Quartz job” and also the refreshing stopped.
Then I tried to restart the binding from the osgi-commandline. By doing that the plugwise devices started to send information again, but only the time stamps..

The same plugwise devices have also been used by my professor who experienced the same problems.

I am thinking of making a “quick-fix” which just check the log for updates, and when plugwise hangs it just restarts openhab.

wborn · October 12, 2016, 7:52pm

OK thanks for the info. When all works well with Source I think the issue is most likely in the binding implementation or maybe the serial library it uses. I haven’t tried the binding on Windows myself but maybe someone else has? There could be some platform specific code in the binding that causes this issue on Windows.

You could also run both Source and openHAB. And use the HTTP Binding to get info from/to the internal Source webserver.

bryeng · October 13, 2016, 9:32am

Thank you for answering. Okey, perhaps I will try that!

bryeng · October 25, 2016, 2:44pm

I did it, and it is working perfectly. Thank you for the advice.

wborn · October 25, 2016, 8:20pm

That’s good news! It may not be the most elegant solution but it should be rock solid. I do still wonder if you also checked if the most recent firmware (year 2011) is in your Circles?

bryeng · October 26, 2016, 6:27am

Yes, the firmware is the most recent

wborn · October 31, 2016, 4:35pm

@bryeng I finally found some time to run the binding on Windows today. It seems that the “Plugwise protocol message error” occurs more frequently on Windows than it does on Linux. Two times in a couple of hours time. I tested this with exactly the same Plugwise mesh/PC/openHAB configuration. So it does seem to be OS related in some way.

wborn · November 19, 2016, 8:44pm

@bryeng, @frederic Last week I had a 2nd encounter with the “Plugwise protocol message error” on Linux which caused the binding to stop working. So I investigated it and made some fixes for it with PR #4797.

It has not reoccurred with my setup and this fix in place. Also on Windows where I normally run into it after some hours it seems to be gone.

I’ve uploaded a precompiled version of the binding you can use until the PR has been merged and is also part of the normal openHAB distribution (usually after a day or so).

bryeng · November 24, 2016, 9:42am

Thank you! I will try it out!

Do you know what was wrong?

wborn · November 24, 2016, 8:42pm

Each received protocol message should normally end with the CR LF characters. The issue seemed to be that the binding did not make sure the LF character was also properly received. That caused the protocol message errors. A LF character of a message would then end up in front of the next received message.

Also the thread that sends messages waited until it received an AcknowledgeMessage. If this message got lost due to a protocol message error, the send thread would wait indefinitely. As a result the binding could no longer poll the state of Circles or switch them ON/OFF.

So the root cause of the protocol message errors should now be addressed. Furthermore the thread that sends messages now waits at most for 1 second for an AcknowledgeMessage. The binding should now be able to recover from an AcknowledgeMessage that gets lost for some other reason.

The reason it occurred more frequently on Windows is because reading from a serial input stream does not block on that OS. That increased the likelyhood of the LF character ending up in the next message.