Textual configuration: html binding and RegEx

tldr;
to match a float between the strings “start” end “end” use

[stateTransformation=“REGEX:.*?start([+-]?([0-9]*[\\.])?[0-9]+)?end.*”]

Explaination
The problem is that regex with openhab does not behave like e.g. regex101.com. OpenHAB RegEx is greedy and needs to match the whole string. I give an example.
Greedy means that the RegEx algorithm tries to to return a string that is as long as possible. Think of it as if it starts from the end of the string and tries to find a match working its way backwards.

If you have the string

<html lang=‘en’><head><meta charset=‘utf-8’/>

and you wand to match just <head>, on regex101.com you would use

>(.*)<

as regular expression. It starts at the first >, captures everything inside the parenthesis and stops at the <.

If you use the same RegEx in openhab it wouldn’t start capturing unless your string begins with a >. Because your string starts with anything else, your RegEx expression needs to consider this by matching anything before and including the >. The RegEx expression for that is .*>
Now the greedy part of it is that this expression would match everything up to the last >, not the first one. Therefore you need to make it ungreedy by using a ?. Your RegEx to match everything up to and including the first > would be

.*?>

Now comes the capturing group which is some letters. In RegEx thats again .* but in parenthesis. Make that again ungreedy so capture as little as possible by appending a ? Regex for the capturing group is

(.*?)

It stops matching at the next character which is <. For that we need to add this to the expression.

Now we need to match the rest of the string, just to satisfy openhab. Thats again .* but you can leave it greedy. The whole RegEx in this example would be

.*?>(.*?)<.*

It works also with multi line strings.
In general, to match the first expression between “start” and “end” of your string, do:

.*?start(.*?)end.*

To match a float number between “start” and “end” of your string, do:

.*?start([+-]?([0-9]*[\\.])?[0-9]+)?end.*

The textual config of the things file would then look e.g. like this:

type string : Channel_WebsiteWithFloatNumber "my Number as String: [%s]" [ stateExtension="number.html", stateTransformation="REGEX:.*?start([+-]?([0-9]*[\\.])?[0-9]+)?end.*"]

Hope that helps anyone.

2 Likes