I am having some problems with the plugwise to stop after some random time. I have tried to touch the plugwise binding, but it fails again.
The error message i get is: “Failed to schedule quartz-job”.
My suspicion is that the quartz thread-queue is full, and therefore fails. Are there any method of reseting quartz while running?
If not, is there any possibility to have a rule that restarts the whole program if it stops?
I have a similar problem, after a while (random between 5 to 15mn), the plugwise binding stops.
In my logs I have something like this
:34:34.936 [ERROR] [inding.plugwise.internal.Stick:445 ] - Plugwise protocol message error: 00001A3B00C1B88A
13:34:34.950 [DEBUG] [inding.plugwise.internal.Stick:441 ] - Plugwise protocol header error: 000D in message 000D6F0001A4078D
any kind of protocol error.
When I restart it from the osgi console, I have the "Failed to schedule quartz-job " error message, but things work back again.
16-10-05 14:37:30.307 [ERROR] [inding.plugwise.internal.Stick] - Plugwise protocol message error: ???003F25C2000D6F0001A4078D0C251D0201457A96AA
2016-10-05 14:37:30.322 [ERROR] [inding.plugwise.internal.Stick] - Plugwise protocol message error:
2016-10-05 14:37:30.397 [INFO ] [runtime.busevents ] - Lamp_Switch state updated to ON
2016-10-05 14:37:30.398 [INFO ] [runtime.busevents ] - Nevara_Switch state updated to ON
2016-10-05 14:38:15.977 [INFO ] [.service.AbstractActiveService] - Plugwise Refresh Service has been shut down
2016-10-05 14:38:33.687 [ERROR] [b.plugwise.internal.CirclePlus] - Error scheduling Circle+ setClock Quartz Job
org.quartz.ObjectAlreadyExistsException: Unable to store Job : ‘Plugwise.000D6F0001A4078D-SetCirclePlusClock’, because one already exists with this identification.
I’ll try to monitor logs and relaunch the osgi module from an external cron script for the moment.
I used the patched binding for 1.9.0. together with the serial for 1.8.3.
Hope that helps.
I think thoses protocol errors should not stop the module from working. don’t know if it can be done
@bryeng is the missing information you refer to the current power? I’ve fixed that with #4669 less than a week ago. You can download an updated version of the binding with this fix from the openHAB build server.
I also have the Error scheduling Circle+ setClock Quartz Job exception at startup. But it is thrown only once at startup and does not cause any major failures for me. It would be better when it is fixed though.
Thank you for the response.
That is correct. I tried your binding, but I still get the same error:
2016-10-11 10:00:41.733 [ERROR] [inding.plugwise.internal.Stick] - Plugwise protocol message error: 0000AA2000C11423
Do anyone have any idea of what the error might be? Or why it happens?
I’ve seen this error only once so far. When it occurred I restarted openHAB to fix it. Here are some ideas on what may be causing it:
Outdated firmware in Circles; you can check the firmware version in Source, it should be year 2011 or newer.
USB power issues, a common issue when using a cheap power adapter with Raspberry Pi’s; test a better USB power adapter or another PC.
Some OS USB driver issue, on Linux something may be logged about it in /var/log/syslog or /var/log/kern.log. Try another/fixed OS/PC.
Faulty USB port or chip; test with another port/PC, or see if another device does work on the same port.
Some other program that changes the serial port settings. When it occurs you could check if the port settings are still correct. They correct ones are initialized in org.openhab.binding.plugwise.internal.Stick.initialize()
Something in the Stick itself that causes it. Perhaps the Source can recover from it be re-initiating the Stick. I’ve also seen the Source temporarily loose its connection with the Stick. But only occasionally and not every 15 minutes. Also the Source software may be a bit smarter to recover from corrupted messages.
Because I rarely have this issue, I think in my case the last scenario occurred. In your case probably one of the others.
how many plugwise devices are you using ? Back when I was using plugwise it showed very unstable behaviour (there should be some posts from me here), and i only was able to improve it by rebooting my openhab server each night. I had about 35 plugs etc, and it might indeed be the number of devices or the mesh of my zigbee network that caused these issues. Meanwhile I replaced all my plugwise devices with zwave ones and use a zipabox for these zwave devices, polling the status from openhab sometimes, and i’m not facing these issues anymore. Its not the configuration I originally intended, but as openhab does not yet support zwave security stuff for me it was the best compromise i could get for now.
@bryeng can you tell us a bit about your setup? Is it Linux/Windows? Using a mini PC like a Pi? Does it all work well with Source?
I’ve just created PR #4700 that should fix the Error scheduling Circle+ setClock Quartz Job error. Though that will probably not fix anything for you because it was a caught and logged exception anyways. More importantly this PR improves switching Circles by making it faster and more reliable.
I run openhab on a computer with windows 7 and everything works fine with Source.
I have four circles nearby the computer, so that should not be an issue.
Currently the interval is 200 ms on the zigbee network.
I have tried some different methods to restart the plugwise-binding.
First I changed the modification-date of the .jar-file -> forcing plugwise to reload the binding. That resulted in “Failed to schedule Quartz job” and also the refreshing stopped.
Then I tried to restart the binding from the osgi-commandline. By doing that the plugwise devices started to send information again, but only the time stamps…
The same plugwise devices have also been used by my professor who experienced the same problems.
I am thinking of making a “quick-fix” which just check the log for updates, and when plugwise hangs it just restarts openhab.
OK thanks for the info. When all works well with Source I think the issue is most likely in the binding implementation or maybe the serial library it uses. I haven’t tried the binding on Windows myself but maybe someone else has? There could be some platform specific code in the binding that causes this issue on Windows.
You could also run both Source and openHAB. And use the HTTP Binding to get info from/to the internal Source webserver.
@bryeng I finally found some time to run the binding on Windows today. It seems that the “Plugwise protocol message error” occurs more frequently on Windows than it does on Linux. Two times in a couple of hours time. I tested this with exactly the same Plugwise mesh/PC/openHAB configuration. So it does seem to be OS related in some way.
@bryeng, @frederic Last week I had a 2nd encounter with the “Plugwise protocol message error” on Linux which caused the binding to stop working. So I investigated it and made some fixes for it with PR #4797.
It has not reoccurred with my setup and this fix in place. Also on Windows where I normally run into it after some hours it seems to be gone.
I’ve uploaded a precompiled version of the binding you can use until the PR has been merged and is also part of the normal openHAB distribution (usually after a day or so).
Each received protocol message should normally end with the CR LF characters. The issue seemed to be that the binding did not make sure the LF character was also properly received. That caused the protocol message errors. A LF character of a message would then end up in front of the next received message.
Also the thread that sends messages waited until it received an AcknowledgeMessage. If this message got lost due to a protocol message error, the send thread would wait indefinitely. As a result the binding could no longer poll the state of Circles or switch them ON/OFF.
So the root cause of the protocol message errors should now be addressed. Furthermore the thread that sends messages now waits at most for 1 second for an AcknowledgeMessage. The binding should now be able to recover from an AcknowledgeMessage that gets lost for some other reason.
The reason it occurred more frequently on Windows is because reading from a serial input stream does not block on that OS. That increased the likelyhood of the LF character ending up in the next message.