[SOLVED] Regex parse on Exec binding fails

Hi - this is hopefully yet another stupid error on my behalf… hopefully someone can help.

I am scraping a web page using the Exec binding.

Exec binding is set as follows:

Command:
/usr/bin/wget -U Mozilla/5.0 -q https://www.accuweather.com/en/gb/manchester/m15-6/minute-weather-forecast/329260 -O -

Transform:
REGEX((?<=","summary":")[\w\s\d]*(?=","updated"))

Seems my regex does not work - yet when I test it on regex101 is is fine: https://regex101.com/r/yHJZn5/1/

What am I missing?

Try that one:
REGEX(.*(?<=","summary":")[\w\s\d]*(?=","updated").*)

Thanks Vincent - I had already tried adding the the bits at the beginning and the end, but this fails.
The regex in my example is using look aheads and look behinds to parse just string from the middle.

I have also attempting going one further and escaping the speech marks:
REGEX(.(?<=",“summary”:")[\w\s\d](?=",“updated”).*)

Still no luck.

I found an old doc saying that the REGEX transform appended ^ at the start and $ at the en(though I don’t know if this is still trueassume it is, it explains why your ides should resolve the bug.

Unfortunately though, it fails and even fails on https://regex101.com/r/yHJZn5/2 (unless I append /gm to the end of the pattern : https://regex101.com/r/yHJZn5/3

I have never been really good with regex and it’s always experimentation
What I do is I narrow it down
Start large .*().*
and then start adding stuff inside bit by bit

Unfortunately openHAB REGEX doesn’t behave the way standard REGEX does. For one your REGEX must match the entire message, hence Vincent’s recommendation of adding the .().. Second, you can only have one match (i.e. one ()). If you have multiple matching groups there is no way to tell the REGEX transform which one you actually want. That transform appears to have two matching groups. Finally, you must supply a matching group that defines what part you want from the string.

It looks like what you want is the stuff between "summary":" and ","updated" so the REGEX should look something like:

REGEX(.*"summary":"(.*)",\"updated.*)

The .* matches everything up to the "summary":", the "summary":" is the beginning marker for the text we actually care about, the (.*) matches and returns the text you want to extract, the "updated is the ending marker for the text we actually care about and the .* matches the rest of the String.

2 Likes

You’re a rock star sir! Thanks.
Lesson learnt from your comment too - I understand the problem. Thanks Again

Please tick the solution post, thanks