Items files not being fully parsed at startup

Tags: #<Tag:0x00007f5c952c8618>
  • Platform information:
    • Hardware: Raspberry Pi 3 Model B Rev 1.2
    • OS: Raspbian GNU/Linux 9 (stretch)
    • Java Runtime Environment: (Zulu Embedded 8.25.0.76-linux-aarch32hf) (build 1.8.0_152-b76)
    • openHAB version: 2.5.4-1

I’ve recently upgraded from OH2.4 to 2.5.4 which, on the whole, went OK. However, I’ve noticed that when the OH service is re-started the ‘items’ files do not always get parsed correctly, and I end up with missing items (appears to be random). If I modify an items file e.g. add a space, and save the file, OH seems to then correctly pick up the missing items. No errors regarding the items files are logged at startup.

I found a similar issue logged back in 2016, but cannot find any recent mentions of similar problems.

2 Likes

If you run openHABian, there is a menu option which delays text file loading for a fixed time. openHAB does not have a startup priority built in.

It certainly still exists, and has got worse with 2.5
(I never suffered running on a Win laptop at 2.4, but do now)
It does crop up a lot, but not usually reported as clearly as you have.
On of the most annoying long-term issues with openHAB, there is no real fix,only workarounds.
It’ll also manifest as stuff like rule Actions missing, etc.

I never had this issue…

I have tree openhab systems running, (well my Rpi4 does only run when I test stuff).
One Rpi3B+ running my main system.
One Odroid C2 running with Zigbee and Velbus.
OneRpi4 which is only turned on for some specific tests, (atm only the ipcamera binding).
Specially the last two test setup are restarted quite alot.

All running openhab 2.5.x (stable versions only). None is using SD cards (except the Rpi4 cause it need it to boot from SSD), only SSD or eMMC (Odroid).

This is not related to the order of loading .rules files. There is something that goes wrong after clearing the cache that messes up loading of .items files. I’ve not seen the problem occur with JSONDB Items but would not be surprised if it manifests there too.

This particular problem only manifests after a clearing of the cache so if you are not updating or manually clearing the cache frequently you wouldn’t see it.

If you wait until OH fully loads and then restart OH that too will clear the problem. It’s only that first start of OH after clearing the cache where this problem appears.

Ahh… I do clear cache quite often (on my test setup´s). But to be honest, I simply igonore the first startup, cause it´s always filled with tons of different errors. So after the first startup has finished, I do another restart, (and sometimes even a third restart), simply to make sure.

Not convinced that is all that is wrong. I routinely get this having never cleared cache nor recently updated. Boot up time example

2020-04-30 09:16:49.359 [ERROR] [ntime.internal.engine.RuleEngineImpl] - Rule 'New Event Log': The name 'lock_log' cannot be resolved to an item or type; line 125, column 3, length 8
2020-04-30 09:16:49.369 [ERROR] [ntime.internal.engine.RuleEngineImpl] - Rule 'New Event Log': The name 'EventLog17' cannot be resolved to an item or type; line 134, column 6, length 10

lock_log is a global object declared at start of file
var java.util.concurrent.locks.ReentrantLock lock_log = new ReentrantLock()
There’s an import just before that of course, but no indication the import fails at xxx.rules load time.

‘EventLog17’ is one of twenty similar String Items defined in an xxx.items file, EventLog00 to 19. It’s really random why one in the middle comes to grief, the rule would have had to process 19 and 18 to discover 17 was missing.

Stranger still, to get there it must have passed the lock_log problem - I do believe the first rule run fails at the lock (it would be attempting to log a bootup message) but the second rule run on another message passes that hurdle.
I’d guess the import is done asynchronously and takes a finite time.

An openHAB stop / start will usually come up clean.
This is happening on a Windows laptop, I expect environment and host performance has a big influence.

Now, this is a development system i.e. it might be off for days and then booted up.
How cache-y is this cache, I wonder?
Does it timeout, become unusable, while system is offline?
I must look into this more closely.

Okay, tried a “hot reboot” from the above situation, those problems cleared of course but instead a new occurrence of an old favourite
"The name 'ON' cannot be resolved to an item or type; line 48, column 24, length 2"
gahh!

Second “hot reboot” … clean as a whistle.
Alright, now I’m beginning to believe the cache thing and that it is time based.

I don’t use Rules DSL any longer so maybe that’s why I’m only seeing Item errors. But I get the Items do not exist errors from both the Rules and the Sitemap when it happens.

I don’t know enough about how Rules DSL works in this regard. In Java, the import actually get’s performed at compile time so my assumption is that it would take place for .rules at load/parse time. In fact, I know it has to because otherwise it wouldn’t be able to know that, for example, the type of your lock_log variable when it loads the file. When you comment out the import, when it gets to that line OH will fail to load the file entirely and complain that it cannot resolve ReentrantLock to an item or type.

So when it gets to that first error it’s already done the import or else it wouldn’t have been able to load the .rules file enough to actually execute it and generate that error.

My theory, which obviously isn’t based on the actual code, is that when you clear the cache certain stuff takes longer to load compared to normal because they have to be redownloadedand installed. This throws off the timing and prevents or overwrites or something like that the loading of the text configs. The reason a restart of OH clears it is because the cache is populated and it doesn’t have to wait as long for that stuff to load.

But there is another and similar timing issue that may also come into play at the same time or on subsequent loads where the .rules files start loading and more importantly start executing before the Rules engine is ready for them to execute. The “'ON cannot be resolved” and perhaps even your global variable cannot be resolved error is often a symptom of that problem.

So in your case you might have hit both errors at the same time, or hit one error one time and the other error the second time. It’s hard to say. But in the past, delaying the .rules files loading until well after OH has started (or using Scripted Automation) has usually corrected the problems with variables and constant enums (e.g. ON/OFF) from being recognized. But it does not correct the problem where you bring up the sitemap after booting after a clearing of the cache and half of the Items are not recognized.

My Rpi4 can be off for several days, even a coupple of weeks… I see no differences with that. However, I only have very few items on that system, since I´m doing some testing for the ipcamera binding.

I can see how those could be related, somehow the global namespace is not ready before rules are run.
It’s harder to see how rules could be “compiled” (and registered for events, and triggered!!) before that global namespace is complete.

Especially when we know the very first run of a rule is often delayed, assumed to be some kind of lazy loading / compiling at work.

It really shouldn’t be this difficult just to boot up this system :crazy_face:

1 Like

I agree.
What makes this a “less” serius matter is, that it only happens after a clean cache… If it happened on all reboots/restarts, then it wouldnt be acceptable at all.

more to look at here, but I’m suspecting it also happens after some time off the air (cache goes stale).
That does not affect the average OH installation, of course :smiley:

It’s really weird that it takes two goes to populate it - I suppose some part of caching relies on something else to be cached. That shouldn’t be that hard either.

There is lots of history to all this, and I appreciate it seems hard to tackle, an inherent problem of the framework

1 Like

To be specific, Karaf upon which all of this stuff runs. This was never a problem in OH 1.x, though the way boot order/timing was handled there was less than satisfactory too.

I think it’s because it get’s registered as the fist step in being loaded, or at least that is what appears to be the case. But delaying the loading of the .rules files will prevent the no symbol or Item error. This strongly implies that there is something going on where Rules are somehow triggered before the engine is ready to actually run the Rule.

When an Item simply doesn’t appear to exist (e.g. both rules and sitemap complain that it’s not there long after OH has finished booting) I think is a different problem and is unrelated to the .rules files loading. My evidence for this is the fact that the sitemap will complain about missing Items that are never referenced in rules.

Both cases do appear to be related to timing of the boot sequence which is really hard/impossible to control well on Karaf.

Other bits of evidence the meaning of which is TBD:

  • the no symbol or Item error does not occur in Scripted Automation
  • I’ve never seen a report of the no symbol or Item error occurring when Items are in JSONDB even when using .rules files
  • the no symbol or Item error doesn’t occur when the .rules files are delayed from loading until after OH has come up
  • I have seen on one occasion (haven’t been able to reproduce, could have been something else as it was reported by someone else) the Item doesn’t exist problem even when using Scripted Automation and JSONDB defined Items
  • I routinely see the Item doesn’t exist problem with Scripted Automation but Items are defined in .items files
  • I only see the Item doesn’t exist problem when the cache has been cleared
  • I don’t have a good measure yet, but it seems the Item doesn’t exist problem mainly occurs on more powerful machines (i.e. not RPis).

This makes me think they are separate problems. They are both related to timing. But they are different timing problems.

1 Like

I totally agree… I dont understand it either.

I have only read some of the posts in the thread… But it seems like what they´re discussiing in there, havn´t changed much today.
I cant say I have a huge knowledge on this, but I wonder how come it isn´t possible to startup/shutdown in silence mode using verfification…

It´s probably easier to startup rather than to shut down. Startup doesnt really need a special timing. It needs verfification Just make sure what is started is actually running, (thats verification). When all services is running and verified, go to next step, things/items, again with verification. After that rules/etc. It makes no sense starting or running any rules, unless the prevoius steps have been done and verified.

All this could go on in a cache in total silence, only showing (and log) if there is issues somewhere.

Shut down is far more difficult. It could probably be done in backward order. But since some rules/etc depend on the shutdown as well, its highly difficult. I havn´t got any good idea of this one yet.

The problem as I understand it is that Karaf starts up all the bundles in parallel and doesn’t provide a good mechanism to say “wait for foo to start before trying to start bar”.

A difficulty is that foo may need bar loaded first … but only if rhubarb binding is configured.
The highly modular nature of openHAB works against it here, some thing as apparently simple as “wait till the items are loaded” … well, ok, but there may or may not be an indeterminate number of files and a separate JSONB source.

1 Like

Does any of the bundles depend on each other during start up?
As far as I understand the whole system could be loaded, but keeping things, items and rules untill last, and in that particular order.

There was an issue opened a long time ago (OH 2.0 time frame), work was done on it and it was determined that there was no way to do this in Karaf.

I dont believe that…

Lets look at it in a practical way (and a bit logic to me).
When you start openhab from scratch, there are no bundles loaded. Whatever you do from here, it requires first the bundle/addons, then the thing and then the items. You cant really do it the opposite way, (well, you can when using config files, but it sure wont work, and it wouldnt make any sense).

Doing exactly the same when restarting after clean cache should be possible… Or at least, thats the logical part to me.

openHAB itself is several Karaf bundles.