Need help with insteonplm binding on latest OH 2 snapsot [SOLVED]

jaym · December 31, 2016, 11:19pm

I just tried using the 1.9.0 binding in my OH1 instance. It worked for me there. Oddly, after I tried that, my OH2 started to download the database from the hub without problem.

I think my issue may have been a conflict of running OH1 and OH2 on the same server with different versions of the binding. I’ve shutdown the OH1 and restarted OH2 and no longer having an issue.

Bernd_Pfrommer · December 31, 2016, 11:55pm

Yeah, no idea what will happen if you run two bindings at the same time. They will be sending polls to the same hub … shudder.

Noircogi · January 1, 2017, 12:33am

I used the InsteonTerminal to remove a stale device from the modem’s table, but that didn’t resolve anything.

It seems like the binding is running while it’s trying to download the modem database, because I get sequences like this:

16:03:52.644 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:05:52.647 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:07:52.649 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:09:52.651 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:11:52.377 [INFO ] [g.insteonplm.InsteonPLMActiveBinding] - devices:   0 configured,   0 polling, msgs received:   616
16:11:52.653 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:13:52.655 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:15:52.657 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:17:52.660 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:19:52.662 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!
16:21:52.380 [INFO ] [g.insteonplm.InsteonPLMActiveBinding] - devices:   0 configured,   0 polling, msgs received:   616
16:21:52.664 [WARN ] [onplm.internal.device.ModemDBBuilder] - modem database download unsuccessful, restarting!

Under InsteonTerminal, it takes a couple of minutes to download the database but it succeeds every time. It’s like the version in the binding is doing about the same but fails. Is there any additional debug available for modem database download phase?

Is it perhaps hitting a timeout?

Edit: this time, after about 35 minutes it succeeded in a complete download.

Later, I’ll try starting openhab up with my echos powered off.

Noircogi · January 2, 2017, 8:57am

In the source, file: ModemDBBuilder.java I found the following which would completely explain my problem:

private int m_timeoutMillis = 120000;

Doubling that timeout would almost certainly work for me as the insteonTerminal program completes the download in about 2 minutes and 13 seconds on my 4.4GHz Ivy bridge server. I would guess a system with more devices would take even longer though.

How difficult is it to set up an environment to do test builds? It looks like the code is pre-compiled, as I can’t see a file of that name on my target system.

By the way, powering down my Amazon Echos doesn’t seem to make any significant difference as to how long this takes.

Although I’m an experienced kernel guy, I know nothing about java.

ranielsen · January 2, 2017, 2:29pm

Nice find. The simplest approach would be clone the git repo (https://github.com/openhab/openhab), make sure you have maven installed, go to openhab/bundles/binding/org.openhab.binding.insteonplm and run ‘mvn clean install’, and you should have a jar file in the target directory.

If you want an IDE set up, you can follow the instructions at: http://docs.openhab.org/developers/contributing/contributing

Noircogi · January 2, 2017, 11:12pm

Thank you! I was able to compile the code, but better yet, it turns out there’s a built-in config setting for this:

I put the following in my insteonplm.cfg file and the DB successfully downloads first time.

modem_db_retry_timeout=240000

Now I can move on to the task of configuring my items.

ranielsen · January 3, 2017, 3:23pm

Good news, @Bernd_Pfrommer do you think the default timeout should be increased? BTW, I’m surprised that the message modem db builder thread interrupted isn’t in the log file. If you look at https://github.com/openhab/openhab/blob/master/bundles/binding/org.openhab.binding.insteonplm/src/main/java/org/openhab/binding/insteonplm/internal/device/ModemDBBuilder.java#L65 it has:

@Override
public void run() {
    logger.trace("starting modem db builder thread");
    while (!isComplete()) {
        startDownload();
        try {
            Thread.sleep(m_timeoutMillis); // wait for download to complete
        } catch (InterruptedException e) {
            logger.warn("modem db builder thread interrupted");
            break;
        }
        if (!isComplete()) {
            logger.warn("modem database download unsuccessful, restarting!");
        }
    }
    logger.trace("exiting modem db builder thread");
}

Bernd_Pfrommer · January 3, 2017, 4:02pm

I think the InterruptedException will only kick in when the thread is really interrupted, not when the timeout expires (which is what evidently happens here).

Should we bump the default timeout? I don’t know. I’m not sure the rest of Scott’s setup is correct, and I don’t want to bump the default timeout, thereby masking other setup problems (although I have no idea what possibly could cause such a slowdown). My sense is that it should easily download the full database in a few seconds. Why does it take so long? Maybe Scott can investigate using the InsteonTerminal. What is the bottleneck? I should also compare this to the download speed I’m seeing with my hub but I know that’s not gonna happen because I’m too busy.

Noircogi · January 3, 2017, 6:09pm

What would you like me to check?

My network now seems to be 100% reliable and the ModemDB download time under Openhab seems to match what I see under insteonterminal.

When I run the download under insteonterminal, i don’t see any output until it completes, then the whole DB is listed at once.

Bernd_Pfrommer · January 4, 2017, 1:36pm

If the speed with insteon terminal is the same, then I can’t come up with anything reasonable explanation as to why it should be so slow. In insteonterminal you can simply hack a print statement somewhere into the python code that tells you whenever a new line of the modem database comes in. At least that would tell us if it stalls occasionally or if the messages come in at regular intervals. If they do, at what frequency do messages arrive?

Noircogi · January 4, 2017, 2:24pm

I modified the modem DB builder python code to print the records as they come in.
They arrive right around once per second, at a very consistent rate. Since it produces a total of 124 records, that takes just over 2 minutes to complete.

Bernd_Pfrommer · January 4, 2017, 2:43pm

Thanks, that was very helpful!
This means that it’s pulling one record per tcp poll, which happens at 1sec interval. I had thought it would pull multiple records, but not so.
Now how to fix it? We should really only bump the timeout for the hub2. PLM and hub download much faster.
Probably the best would be to have something like “estimated_number_of_devices” in the config file, and then have the binding calculate the timeout from there.

ranielsen · January 4, 2017, 5:08pm

@Bernd_Pfrommer, what if the code for building the database is refactored from an overall timeout for all the link records to a timeout per link record request?

Noircogi · January 4, 2017, 7:25pm

As an experiment, I tried changing the timeout value for insteon-terminal in init.py.
The optimal value for my system seems to be around 250 mS. More than than just slows it down and less results in more retries, also slowing it down.

At 250 mS, the time to load the DB goes down from 125 seconds or so to 37 seconds.

Noircogi · January 4, 2017, 7:45pm

I verified the same change works in the insteonplm binding (poll_time=250 in the init file) and time to download the database goes down to 33 seconds.

Bernd_Pfrommer · January 5, 2017, 2:03am

Duh, that seems like the best idea. Wonder how much work that is to do. I’m more afraid of the testing actually, since I don’t have an old hub. And it was for the old hub that we put this in from what I remember.

Noircogi · January 18, 2017, 11:35pm

My insteon system has been working well for the last couple of weeks.

I do get the following warning message in my log file several times per day:

15:23:05.622 [WARN ] [steonplm.internal.message.MsgFactory] - unknown command code/ext flag: 0x42
15:23:05.624 [WARN ] [ding.insteonplm.internal.driver.Port] - bad data received: unknown command code/ext flag: 0x42

Is this normal? Does it happen with a real PLM (instead of a hub2)? It’s always with the 0x42 value.

I also occasionally get the following, but I assume that’s normal for the insteon binding:

15:01:14.354 [WARN ] [steonplm.internal.message.MsgFactory] - incoming message does not start with 0x02
15:01:14.356 [WARN ] [ding.insteonplm.internal.driver.Port] - bad data received: incoming message does not start with 0x0                                                                                                            2
15:01:14.358 [WARN ] [ding.insteonplm.internal.driver.Port] - got bad data back, must assume message was acked.

Bernd_Pfrommer · January 19, 2017, 2:06am

That’s a property of 2014 hub. What’s your poll interval set to? Did you put that back to 1000ms?

Noircogi · January 19, 2017, 2:52am

Ok. Thanks.

I’m still using the 250mS poll interval and that seems rock solid. The database downloads nearly 4X as fast.

Bernd_Pfrommer · January 19, 2017, 2:55am

That could be one of the reasons why you are getting the corrupt messages. I suspect what happens is that the binding polls the hub while only half the message has made it from the PLM modem into the hub’s internal buffer. The more frequently you poll, the more likely you get corrupt data messages.

Bernd