Help with REGEX

ariela · August 14, 2018, 10:15am

Hi folks,

I have a regex that works:

(?<=“tree”:)(\d*)(?=,)

But this regex matches 8 times, I need to select only the first time, and in another item possibly only the second time.

Any clue how to accomplish that?

Thank you very much for your support

Andrea

rlkoshak · August 14, 2018, 5:02pm

You need to change your regular expression to only include one match. Then in a Rule further parse out the values you want. There is no way to handle multiple matches from a REGEX in OH as far as I know.

ariela · August 14, 2018, 5:56pm

Mmmm … that’s not possible right? Struggling in multiple ways

rlkoshak · August 14, 2018, 6:02pm

I don’t know what your text looks like but it is always possible. Your REGEX has three matches defined (each set of parens defines a separate match). But lest say you wanted to match all the “tree”:1234 type entries, assuming they are all next to each other. Then you would use a regex similar to .*(["tree":\d*\s*]+).* which will match “tree”: followed by zero or more digits followed by zero or more white space characters, one or more times and return it all as one String.

Then you need to parse the String to extract the individual Items.

Again, you don’t show the original data, but this looks like it could be JSON formatted in which case JSONPATH would be a better choice.

ariela · August 14, 2018, 6:23pm

This is the original data:

view-source:https://weather.com/forecast/allergy/l/ITXX0042:1:it:
trying to collect info about pollen forecast … data json formatted but in html file so no way to collect this via JSONPATH, right?

ariela · August 15, 2018, 1:24pm

Mmm this regex

(?<=“tree”:)(\d*)(?=,)

is ok via regex101, but not accepted by openhab.

rlkoshak · August 15, 2018, 2:21pm

As I said above, OH cannot handle multiple matches. You have a minimum of three matches in the REGEX. Each ( ) defines a new match.

OH can only handle a REGEX that returns one match.

With that REGEX you will get multiple matches.

404 Not Found

When I browse to the allergy forecast for my area the source of the web page is a bunch of java script imports. I’m not getting any actual data.

ariela · August 15, 2018, 2:23pm

Sorry I’ve pasted the source page.

Try this:

rlkoshak · August 15, 2018, 2:54pm

I’m still just seeing a bunch of javascript. And I’m not super great at regular expressions. I pasted the body of that page into a regex tester and your regular expression returns nothing.

I assume you are trying to match the values in this area:

The following matches that section of the HTML.

[.|\n|\r]*\{"id":"[\d|\.|,]+","vt1pollenforecast":\[(.*\]}}).*

Not there is only one set of () in the regex. The stuff matches in that () is what gets returned to OH.

The part highlighted in green is what will be returned to OH by this expression.

Then you need to write a Rule to extract the tree values from that String and deal with them as desired.

What would be easiest would be to expand the regular expression to match the full JSON text for the forecast, not just the vt1pollenforecast part and then you can use JSONPATH in your Rule to extract the values you need.