How to parse HTML data with REGEX in array

Hi folks,

I’ve a question. I’m parsing data from a website, and the regex works quite good:

Number Http_Meteoalarm_TodayTest "MeteoAlarm [%d]" { http="<[https://www.meteoalarm.eu/en_UK/0/0/AT002-Vorarlberg.html/:5000:REGEX(.*aw.*?([0-9]+[0-9]+).*.?)]"}

The problem is when I have 2+ alerts … so I should have an array, but at the moment I can see only the last value. Any chance to have at least 3-4 items when I can collect 3-4 alerts maximum per site?

I’ve tried with:

Number Http_Meteoalarm_TodayTest "MeteoAlarm [%d]" { http="<[https://www.meteoalarm.eu/en_UK/0/0/AT002-Vorarlberg.html/:5000:REGEX(.*aw.*?([0-9]+[0-9]+).*.?[0])]"}

but I see always the second value, and not the first.

Any clue?
thanks
Andrea

Given the item is a Number, what would you expect the Item’s state to be assuming you were able to match the rest results? A Number is just that, a single numbering value.

You can use a regex to capture the full “array” of values to a String Item and then a Rule to further spit it the values and assign then to their own unique Number Items.

1 Like

Ciao Rich,

thanks for your support here :slight_smile: much appreciated.

I’ve changed the “master item” for example for:
view-source:https://www.meteoalarm.eu/en_UK/0/0/AT008-Burgenland.html

String Http_Meteoalarm_TodayTest "MeteoAlarm [%s]" { http="<[https://www.meteoalarm.eu/en_UK/0/0/AT008-Burgenland.html/:5000:REGEX(.*aw.*?([0-9]+[0-9]+).*.?)]"}


If I well understood what you are suggesting, I should see the full “array” here. At this moment in time there are 2 alerts (aw53 and aw32), but in the string item I can see only the second one (32)

what I’m missing?

thanks
Andrea

All I can suggest is you need to update the regex to capture all the data. I suggest using one of the many online regex testers to experiment until you get all the data.

Rich,

the regex already match both alerts (tested via regex101). But in the String item I see only the last one, not both …

see here:

in this example I see only “53”, and not “33 53”

item:

String Http_Meteoalarm_TodayTest "MeteoAlarm [%s]" { http="<[https://www.meteoalarm.eu/en_UK/0/0/AT008-Burgenland.html/:5000:REGEX(.*aw.*?([0-9]+[0-9]+).*.?)]"}

There is a workaround without using the array:

  • set 12 items (aw1x, aw2x, …)
  • use a map for each item (i.e aw1.map where I put 1= low risk 2= medium risk 3= high risk 4= extreme risk)
  • play in sitemap with “visibility”

But it seems not elegant at all :frowning:

I think you misunderstood what I was saying. OH regular expressions do not support multiple matches. So you need to write a regular expression that matches everything you want in one match.

What you have matches the full list, not each part of the list individually.

So if you have “aw53 aw32” as the list that you are trying to match against, you need a regex that looks something like .*([aw\d\s*]+).* which matches “aw” followed by one or more digits followed by zero or more spaces one or more times.

1 Like