RegEx bug?

(Justin) #1

I’m using a RegEx transform on a HTTP item but it’s matching up to a completely wrong value and I can’t see why. regexr.com and regex101.com confirm my expression is correct so why is it matching the other value?
My .items file:

// Get current temperature
Number PWINFOCurrentTemp                      {http="<[PWINFOCache:1000:REGEX(^\\d*\\/\\d*\\/\\d* \\d*:\\d*:\\d* (-?\\d*\\.\\d*).*)]"}
Number PWINFOFeelsLikeTemp                    {http="<[PWINFOCache:1000:REGEX(.* (-?\\d*\\.\\d*) \\d.\\d \\d \\d $)]"}

Events log shows

2019-04-08 00:52:05.490 [vent.ItemStateChangedEvent] - PWINFOFeelsLikeTemp changed from NULL to 0.5
2019-04-08 00:52:05.513 [vent.ItemStateChangedEvent] - PWINFOCurrentTemp changed from NULL to 16.4

Query string is:

08/04/19 00:59:05 16.4 57 7.9 3 11 256 0.0 0.0 1012.30 WSW 1 km/h C hPa mm 4.2 -0.16 0.6 70.4 0.0 22.1 44 16.4 +0.5 16.8 00:00 16.4 00:58 5 00:33 16 00:19 1012.40 00:37 1011.96 00:01 1.9.4 1099 13 16.4 16.7 0.0 0.00 0 244 0.0 8 0 0 WSW 3496 ft 15.3 0.0 0 0 


0 Likes

(Udo Hartmann) #2

afaik you can’t use $ as “end of line”, you’ll have to match the former part of string. How about

Number PWINFOFeelsLikeTemp                    {http="<[PWINFOCache:1000:REGEX(.* ft (-?\\d*\\.\\d*) \\d.\\d \\d \\d.*)]"}

?

0 Likes

(Justin) #3

I couldn’t find any reference to not using $ for EOL in the RegEx Transform docs. Do you know if it’s mentioned anywhere in particular?
In this case I settled on REGEX((?:\\S*\\s){54}(\\S*).*) as the data should remain constant (Unless a software update changes the output) as a space delimited output where no figures are omitted but I wanted to check for future reference so I don’t spend 2 hours pulling my hair out wondering why it’s not working.

0 Likes

(Udo Hartmann) #4

Well, no, I doubt that. It’s more a “look at any given example” :wink:

0 Likes

(Justin) #5

The documentation only gives 4 examples and neither ^ nor $ is mentioned. ^ worked in my first rule and the only differences noted compared to plain RegEx is having to double backslash to escape the character which I’ve seen in many programming cases so that didn’t strike me as odd. What does seem really odd though is that it otherwise seems to use a standard RegEx engine with the exception that $ does not seem to be recognised.
Looking at other examples doesn’t mean it wasn’t used because of an undocumented ‘feature’ but that it wasn’t needed in that case. If $ is indeed invalid, should it not be raised as a bug to either update the documentation or the transform so that they both align?

0 Likes

(Udo Hartmann) #6

Hmm. I doubt that :wink: In fact, you always have to use .* as the first part of a REGEX, if the search term is not at the very beginning. I’m pretty sure that this is mentioned “somewhere” :wink:

0 Likes

(Justin) #7
<[PWINFOCache:1000:REGEX(^\\d*\\/\\d*\\/\\d* \\d*:\\d*:\\d* (-?\\d*\\.\\d*).*)]

That’s the first rule I made, searches from the beginning of the string to avoid any chance of it reading from elsewhere (It was originally reading from a 3 day log file) and it worked perfect. It’s at the start of the string (Just like the rule I was trying to make with $ was at the end of the string) so ^ seems logical to use, just like $ for the end. So it looks like you don’t “always” have to use .* at the start and that above is an example that shows that :wink:

0 Likes

(Udo Hartmann) #8

Jep, but that’s the beginning of the string, anyway. The point is, REGEX will always use the ^ in the search string, wheater you use it or not.

REGEX(My Value (\\d.) .*)

in openHAB is the same as

REGEX(^My Value (\\d.) .*)
0 Likes

(Justin) #9

That I know, and the other rule I was trying to parse was from the end of the string where the string ended with (digit)(whitespace)(end) but my rule of \\d $ should (and regexr.com confirmed that) match only at the end of the string (again wanting to rule out any chance of multiple matches) but it doesn’t. So why does OH accept ^ but not $?

0 Likes

(Justin) #10

Also it’s not that OH automatically includes $ at the end of the string because omitting it also did not match

0 Likes

(Rich Koshak) #11

Another difference is in the OH context, you must match the full string and use parens to indicate what parts you want returned. I don’t know if this is because of the library used or how the library is used, but in all cases I’ve seen, the REGEX must match the full String.

I don’t know if somehow you are not successfully matching the full String with the $ or for some reason the $ doesn’t work in OH.

I think that is an important point. Because OH requires the expression to always match the full string, not just a single line, the ^ and $ is redundant.

Standard REGEX doesn’t require a match of the full string. In fact that is considered a bad thing to do as it’s inefficient. For whatever reasons though, the OH implementation requires the expression to match the full message/String.

That strongly implies that the expression didn’t match the full String then. There is something in the String not matched by the expression.

0 Likes