Exception handling and logging

Hey guys.
I’m sure everybody ran into one once: In the openhab/openhab2 log you will receive java exceptions almost on a regular basis especially while developing rules or on startup. One example:

Exceptions serve a valuable purpose, but why are they not catched and logged? Continuing the example from above, everybody would benefit from one line instead, e.g.

Error while loading icon file <Sunset-icon>, ...(reason)...

Over the time I saw quite a lot different exceptions in my log and especially while building rules, it would help to have meaningful error messages. Is there any sane reason to have them in raw format?

The fact that you are seeing that in your log means that they are being caught and logged.

I completely agree. And for most exceptions that take place in openHAB core the exceptions are caught and are printed out in a meaningful way. Fortunately the core of OH is actually pretty solid and these exceptions very rarely occur in OH 1.

OH 2/Eclipse SmartHome is undergoing development and full coverage of exceptions is probably not there yet. Welcome to Beta software. Whoever coded the IconServlet probably didn’t anticipate it being passed an icon that isn’t there which caused a NullPointerException. And (this is really low level programming stuff) NullPointerException is a special type of Exception which doesn’t have to be declared as thrown so the calling code isn’t forced to catch it. It is entirely possible the author of that servlet was not even aware that a NullPointerException could even be thrown there so he/she wouldn’t know to try and catch.

Most of the errors that occur are in Rules or other parts of the system that are based on interpreting and running your program. Yes, that is right, your configuration (rules, persistence items, sitemap, scripts, and transformations) are a program. This is a program that runs on top openHAB’s core which runs on top of Java. Neither openHAB’s core nor Java is smart enough to catch and interpret every type of error that could possible occur in your rule. So these exceptions tend to be generic and not always informative.

Can OH do a better job with meaningful exceptions in this case? Of course. But it is never going to be as useful as it needs to be. There isn’t enough information at these levels to make it meaningful.

But if you want truly meaningful exception handling you need to implement it yourself in your rules. Honestly, this is as it should be because it is inside your rules where you know the most and can therefore generate the most meaningful exception text and it is the only place you would have a chance at recovering from the error.

Full disclosure: That was me :wink:

Yeah, let’s say: “didn’t pay enough attention”. So what @rlkoshak says is right: Such exceptions are not at all meant for the user, they simply highlight bugs that should be reported asap (in this special case at https://github.com/kaikreuzer/openhab-core/issues).

The icon problem seems to be more complicated than just the file not being there. I created a ticket for that issue.

Okay, bad choice of words. I was talking about not having the whole stack trace in the openhab.log/karaf console. It’s my believe that these lines just clutter up the log for most users without the benefit of a clear message to work with. Normal end users without development experience will only be confused by “20 lines of gibberish”. Of course there will always be bugs and exceptions in rules will not always be easy to “reprocess” for the user, but I think there should be some reconsideration - for usability reasons!?

My first question got answered: Yes, exceptions should be reported and dealt with.

My follow up suggestion would be to introduce a TRACE log level and reduce exceptions to a one- or twoliner on ERROR level. The default log level will stay at ERROR and developers can switch to TRACE. (The message on ERROR level could even encourage the normal end user to report the problem.)

If we are concerned about making the logs more usable for non developers have anyway lost the usability battle. The log should really only be there for developers and average users should never have to look at them except when asking for help on this forum. Of course should and reality rarely meet. But I think the community would be far better served in making the log file obsolete for average users than silencing time making the logs prettier. My opinion of course.

Also keep in mind that OH 2 is in beta and as such the logs are going to be full of errors sometimes and they will be cryptic and make more sense to developers. That is the nature of beta testing software.

I believe there already is a trace logging level. At least there is for OH 1. I know OH 2 does logging differently so maybe it doesn’t any longer. But again, I think it would be far better to eliminate the need for logging for the average user than make the log pretty.

1 Like

Oh I have to disagree :open_mouth: Why would you say that? An event log is an important information for developers and end users alike. In OH everybody benefits from an event log to efficiently build rules for example. Quite the contrary, I would say the log should be more accessible to everybody. Let me show you a part of the web frontend to my (amazing) wifi system at home:

The event log is directly in the management frontend with the functionality to filter and sort events, mentioned devices can even be clicked to directly get to the details page.

I’m not asking to invest a lot of time into polishing up the log on a webpage (One could do this simply by using logstash/ElasticStack), I merely wanted to point out, that in my believe, (1) full stack traces do not help anybody without context, (2) full stack traces in a otherwise one-line syslog oriented file are irritating and discouraging and (3) other information is needed to actually identify and solve a problem.

I just saw an example demonstrating exactly what I’m talking about: smarthome#1866

Beta software is released to get feedback by a broader audience for the active developers to consider. I’m taking part in that process by pointing out a small possibility for improvement. Thanks.

If our end goal is to produce a home automation product for the masses, then depending on a log file for information is the wrong solution. Logs are wonderful for programmers, tinkerers, and technical people. Logs are an incredibly efficient way to see what is going on in a system. And logs are exceptionally non-user friendly for everyone else in the world. If we build OH in such a way that the logs are required by average users we have failed in that end goal.

This is why I say this. The information that is in the log should be presented to the non-technical end user in some other way or OH will only ever be a project for highly technical tinkerers and not average joe end users.

Admittedly, OH 2 has a long way to go towards this goal and logs will be very important in the meantime. And logging will always be important for developers and will probably be the preferred monitoring and debug approach for technical users.

But IMHO the effort of the community would be better spent finding ways to make the log not required to see what is going on inside the OH server instead of spending the effort cleaning it up. And when I say “not required” what I really mean is to not present a bunch of text flowing by in a console. Almost all of the information that is contained in that text must be available to the users somehow. But a bunch of lines of text, no matter how you make it pretty, is not something average joe users will relate to or understand how to use.

One doesn’t need log files when using Word, or iTunes, or their entire iPhone or Android device, games, or almost any other highly complex piece of software intended for general use. We shouldn’t require it of our average openHAB users either.

And I’m not trying to discourage this discussion. I actually think it is worth considering posting as a discussion on the Eclipse SmartHome project’s forum, as the bulk of the work that would need to be done would actually be done there.

However, I do think you might be underestimating the size of the effort you are proposing and the scope should be identified and defined. For example, do we need to go into each and every library that makes up part of ESH/OH2 and modify their code so they don’t log out stack traces? Do we need to add try/catch to every method everywhere to suppress these stack traces? How proactive do we need to be (e.g. this particular case’s stack trace was caused by a bug, not something anticipated)?

The effort could potentially be huge.

I have to agree with you! A log file should not be a central part of the ideal end product. A great GUI will present all relevant information just right where it’s needed and an old-style log file will move to the shadows. I share your believe there and I think PaperUI is going the right direction to achieve that goal at some point in the future.

I also agree with you that there’s still a long way to go till this point in the future. And as this is true, we should be concerned about the ideas of tomorrow and the problems of today alike (as to the amount of effort, see below).

Comparing openHAB with Word or an iPhone seems a bit far fetched but we could compare it with any other complex controlling system like my wifi system, your everyday router or a linux server system. These surely have an event log but merely anybody is using it on a regular basis because the system provides better ways to manage and present information. Your argument of “not required”.

That said, I think the log (not necessarily log file) will stay relevant for a long time and will slowly loose relevance to the broad audience. But let’s get to the point. It seems like there are a few points this can be broken down to.

  1. Move stack traces to TRACE loglevel (for core and binding/bundle developers, with the focus on solving/handling the causing problems)
  2. Reduce exceptions causing stack traces to oneliners on ERROR loglevel (conformity in the log, clarity for the enduser)
  3. Spend more effort on handling exceptions

Without a better knowledge of the core / the logging component, I would say that the first and second steps are probably not that hard to realize. Somebody please correct me if I’m wrong. I think these two things will have a big influence on the acceptance and appreciation by new and experienced users. By presenting problems in a descriptive and informative way, the user is empowered to fix his mistake or motivated to report a bug.

Regarding the third step, yes. But we probably need to distinguish a bit. Exceptions will be thrown for unanticipated problems, i.e. bugs, we can’t do anything about that but as proposed above, these could be presented in a better way.
Causes for exceptions which are known should be handled by adding try/catch blocks. Isn’t that what every good developer does?
A big source for exceptions are malformed rule definitions as you described above but I’m also not sure how to solve these. Should there be exception handling (and descriptive reporting) for obvious problems like illegal castings in the rule execution engine? Should there be a validator/linter checking a rule file every time it’s changed? How will all of this change with the new PaperUI based experimental rule editor? These are difficult questions and I do also not know the answers to them of course. What do you think?

But it is not always in OH/ESH written code where the exception is caught and logged. That is partially where the effort becomes huge as we would have to go into third party libraries and frameworks and modify them to match with this OH standard. And this isn’t a one time thing as every time the library updates we will have to go in and make these changes again.

Another area where this can become huge is there are certain types of exceptions that do not require they be declared to be thrown. NullPointerException is the most common of these but not the only one. Discovering everywhere that NullPointerException could be thrown and making sure that a nice text string providing context could require a completely code walk of every single class OH depends upon from core Java classes down to OH developed code.

My point is two fold:

  • Depending on how one would scope this effort the amount of effort can become huge now and in the future
  • If you scope it down to something manageable then you will never eliminate all the stack traces

But no matter what, it occurs to me this entire conversation would be best had on the ESH forums, not here. 95% of all the exceptions, the logging policy, and pretty much all of the changes would have to be made in ESH.

I think the exceptions that the average user most likely to see will be caused by some problem in the rules. Unfortunately the obscure nature of these exceptions is a side effect of the way the Rules Engine works. The Rules Engine basically slaps a weakly typed programming language (a customized Xtext) on top of a strongly typed one (Java). As such certain errors (e.g. an inability to convert an object of one type into another type that is needed) end up being a relatively uninformative NullPointerException.

Similarly, the Rules Engine operates on scripts that get turned into a bunch of Java classes in memory and, in the process, lose much if not all of its context so when structural errors occur (e.g. syntax errors, lambda errors, etc) result in some arcane Xtext exception.

In both of these cases the actual exception appears to be caught and printed in library code, not OH/ESH code (I’m unable to test this or check the actual code so I could be completely off base). But even if it is in OH/ESH code, the exception is being caught by generic script running code and because much of the context is lost it is quite challenging to come up with a meaningful text for an exception.

Personally, I think the focus should be:

  • work to eliminate the occurrence of exceptions as much as possible
  • for exceptions that might be thrown because of an error in configuration or environment, try to make these exceptions as informative as possible
  • try to some up with a way to add more context to exceptions thrown by the rules engine

Yes, but as I mention above, this is a challenge, though one I think is well worth doing

This is what Designer is for and why I and many others strong recommend its use. The Rule Editor in PaperUI and Habmin2 would serve a similar purpose. If using one of these tools there should never be a situation where a rule fails to load or run because of a syntax error.