I just tried using the 1.9.0 binding in my OH1 instance. It worked for me there. Oddly, after I tried that, my OH2 started to download the database from the hub without problem.
I think my issue may have been a conflict of running OH1 and OH2 on the same server with different versions of the binding. I’ve shutdown the OH1 and restarted OH2 and no longer having an issue.
Under InsteonTerminal, it takes a couple of minutes to download the database but it succeeds every time. It’s like the version in the binding is doing about the same but fails. Is there any additional debug available for modem database download phase?
Is it perhaps hitting a timeout?
Edit: this time, after about 35 minutes it succeeded in a complete download.
Later, I’ll try starting openhab up with my echos powered off.
In the source, file: ModemDBBuilder.java I found the following which would completely explain my problem:
private int m_timeoutMillis = 120000;
Doubling that timeout would almost certainly work for me as the insteonTerminal program completes the download in about 2 minutes and 13 seconds on my 4.4GHz Ivy bridge server. I would guess a system with more devices would take even longer though.
How difficult is it to set up an environment to do test builds? It looks like the code is pre-compiled, as I can’t see a file of that name on my target system.
By the way, powering down my Amazon Echos doesn’t seem to make any significant difference as to how long this takes.
Although I’m an experienced kernel guy, I know nothing about java.
Nice find. The simplest approach would be clone the git repo (https://github.com/openhab/openhab), make sure you have maven installed, go to openhab/bundles/binding/org.openhab.binding.insteonplm and run ‘mvn clean install’, and you should have a jar file in the target directory.
I think the InterruptedException will only kick in when the thread is really interrupted, not when the timeout expires (which is what evidently happens here).
Should we bump the default timeout? I don’t know. I’m not sure the rest of Scott’s setup is correct, and I don’t want to bump the default timeout, thereby masking other setup problems (although I have no idea what possibly could cause such a slowdown). My sense is that it should easily download the full database in a few seconds. Why does it take so long? Maybe Scott can investigate using the InsteonTerminal. What is the bottleneck? I should also compare this to the download speed I’m seeing with my hub but I know that’s not gonna happen because I’m too busy.
If the speed with insteon terminal is the same, then I can’t come up with anything reasonable explanation as to why it should be so slow. In insteonterminal you can simply hack a print statement somewhere into the python code that tells you whenever a new line of the modem database comes in. At least that would tell us if it stalls occasionally or if the messages come in at regular intervals. If they do, at what frequency do messages arrive?
I modified the modem DB builder python code to print the records as they come in.
They arrive right around once per second, at a very consistent rate. Since it produces a total of 124 records, that takes just over 2 minutes to complete.
Thanks, that was very helpful!
This means that it’s pulling one record per tcp poll, which happens at 1sec interval. I had thought it would pull multiple records, but not so.
Now how to fix it? We should really only bump the timeout for the hub2. PLM and hub download much faster.
Probably the best would be to have something like “estimated_number_of_devices” in the config file, and then have the binding calculate the timeout from there.
@Bernd_Pfrommer, what if the code for building the database is refactored from an overall timeout for all the link records to a timeout per link record request?
As an experiment, I tried changing the timeout value for insteon-terminal in init.py.
The optimal value for my system seems to be around 250 mS. More than than just slows it down and less results in more retries, also slowing it down.
At 250 mS, the time to load the DB goes down from 125 seconds or so to 37 seconds.
Duh, that seems like the best idea. Wonder how much work that is to do. I’m more afraid of the testing actually, since I don’t have an old hub. And it was for the old hub that we put this in from what I remember.
Is this normal? Does it happen with a real PLM (instead of a hub2)? It’s always with the 0x42 value.
I also occasionally get the following, but I assume that’s normal for the insteon binding:
15:01:14.354 [WARN ] [steonplm.internal.message.MsgFactory] - incoming message does not start with 0x02
15:01:14.356 [WARN ] [ding.insteonplm.internal.driver.Port] - bad data received: incoming message does not start with 0x0 2
15:01:14.358 [WARN ] [ding.insteonplm.internal.driver.Port] - got bad data back, must assume message was acked.
That could be one of the reasons why you are getting the corrupt messages. I suspect what happens is that the binding polls the hub while only half the message has made it from the PLM modem into the hub’s internal buffer. The more frequently you poll, the more likely you get corrupt data messages.