Could use some help with regex

Hi,

I would really appreciate some help with regex expressions. I just can’t get my head around how to pick data from MQTT. Hope that there’s someone out there who knows how to do this.
I have this coming from a device sending data via MQTT:
47015.171\r\n2018-05-09T20:55:00.0000000Z. I need: A number incl. decimals, date & time put into different items.

But how on earth do I pick each variable?

One way to go about this is to stuff the whole data string received from the binding into a single variable and then have a rule to pick that variable apart and populate other variables when it changes.

The best way to build regular expressions is with one of the many online testers.

https://www.regexpal.com

(\d*\\.\d).*

Should match the number

.*\\r\\n(.*)

Should match the date part.

Thank you Rick. I’ve tried a lot of online testers, but I can’t get my head around how regex works (at all!).

Having an item:

String Power_Export "Export:[%s]" { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX((.*?))]" }

correctly provides the entire string. If I do:

String Power_Export "Export:[%s]" { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX((.*\\r\\n(.*)|(\d*\\.\d).*))]" }

then I get a lot of errors:

[1,117]: mismatched input '*' expecting RULE_STRING
[1,122]: missing '}' at 'd'
[1,123]: mismatched input ')' expecting RULE_ID
[2,37]: no viable alternative at input '%'
[2,40]: mismatched input '%' expecting RULE_ID
[2,72]: extraneous input ':' expecting RULE_ID
[4,211]: mismatched character '<EOF>' expecting '"'
2018-05-10 18:40:41.089 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model 'Electricity.items' has errors, therefore ignoring it: [1,42]: mismatched character 'd' expecting set null
[1,117]: mismatched input '*' expecting RULE_STRING
[1,122]: missing '}' at 'd'
[1,123]: mismatched input ')' expecting RULE_ID
[2,37]: no viable alternative at input '%'
[2,40]: mismatched input '%' expecting RULE_ID
[2,72]: extraneous input ':' expecting RULE_ID
[4,211]: mismatched character '<EOF>' expecting '"'

So clearly combining the two expressions doesn’t work. I assume I need to use the ‘|’ to separate the two expressions?

First of all why would would you want to combine the two expressions? You need a separate Item for the number and the date.

The errors indicate there is a syntax error. I suspect that the OH regex cannot handle multiple and embeded matches. The regex you have provided has three matches:

  1. any number of characters that follow any number of characters followed by a \r\n followed
  2. any number of digits followed by a . followed by any number of digits
    3 the full string that matches 1 or 2

I’m pretty sure the regex transformation needs just one match, not three.

The “|” means or. So the expression will match either the first of the second expression. Typically when using | one puts them inside [].

There isn’t enough space here to provide a regular expression tutorial. This one seems to be pretty thorough.

https://www.regular-expressions.info/tutorial.html

But what you want is something like:

Number Power { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX((\d*\\.\d).*)]" }
DateTime Power_Time { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX(.*\\r\\n(.*))]" }

Thank you @rlkoshak for your help so far. This regex thing is too complex for me :).

I can’t explain why I wanted to use one expression, guess it’s a product of too much Googling…

I now have the two examples you provided and none of them are working, unfortunately :(.

Number Power { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX((\d*\.\d).*)]" }

gives me:

103955 2018-05-13 21:13:06.925 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model ‘Electricity.items’ has errors, therefore ignoring it: [7,21]: mismatched character ‘d’ expecting set null
103956 [7,82]: mismatched input ‘*’ expecting RULE_STRING
103957 [7,87]: missing ‘}’ at ‘d’
103958 [7,88]: mismatched input ‘)’ expecting RULE_ID
103959 [8,40]: extraneous input ‘:’ expecting RULE_ID
103960 [8,100]: mismatched character ‘’ expecting ‘"’

and

DateTime Power_Time { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX(.\r\n(.))]" }

throws:

103920 2018-05-13 21:12:12.055 [ERROR] [.mqtt.internal.MqttMessageSubscriber] - Error processing MQTT message.
103921 java.util.regex.PatternSyntaxException: Unclosed group near index 11
103922 ^.\r\n(.$
103923 ^
103924 at java.util.regex.Pattern.error(Unknown Source) [?:?]
103925 at java.util.regex.Pattern.accept(Unknown Source) [?:?]
103926 at java.util.regex.Pattern.group0(Unknown Source) [?:?]
103927 at java.util.regex.Pattern.sequence(Unknown Source) [?:?]
103928 at java.util.regex.Pattern.expr(Unknown Source) [?:?]
103929 at java.util.regex.Pattern.compile(Unknown Source) [?:?]
103930 at java.util.regex.Pattern.(Unknown Source) [?:?]
103931 at java.util.regex.Pattern.compile(Unknown Source) [?:?]
103932 at org.eclipse.smarthome.transform.regex.internal.RegExTransformationService.transform(RegExTransformationService.java:65) [216:org.eclipse.smarthome.transform.regex:0.10.0.b1]
103933 at org.openhab.core.transform.TransformationHelper$TransformationServiceDelegate.transform(TransformationHelper.java:65) [232:org.openhab.core.compat1x:2.2.0]
103934 at org.openhab.binding.mqtt.internal.MqttMessageSubscriber.processMessage(MqttMessageSubscriber.java:138) [246:org.openhab.binding.mqtt:1.11.0]
103935 at org.openhab.io.transport.mqtt.internal.MqttBrokerConnection.messageArrived(MqttBrokerConnection.java:556) [245:org.openhab.io.transport.mqtt:1.11.0]
103936 at org.eclipse.paho.client.mqttv3.internal.CommsCallback.deliverMessage(CommsCallback.java:475) [245:org.openhab.io.transport.mqtt:1.11.0]
103937 at org.eclipse.paho.client.mqttv3.internal.CommsCallback.handleMessage(CommsCallback.java:379) [245:org.openhab.io.transport.mqtt:1.11.0]
103938 at org.eclipse.paho.client.mqttv3.internal.CommsCallback.run(CommsCallback.java:183) [245:org.openhab.io.transport.mqtt:1.11.0]
103939 at java.lang.Thread.run(Unknown Source) [?:?]

The last is probably due to the fast that Regex includes \r\n, but I just can’t figure out how to remove them :triumph:

Given that I really struggle with Regex, could you please provide an example of how to do this?

I believe the binding is cutting off the last character in your regex, therefore not seeing the closing parenthesis. You can test this by adding a space after the last closing ). That’s a problem that should be fixed in the latest snapshot.

I’ll also take a moment here to issue a rant about how the transformation service is prepending ^ and appending $ to the transform. That makes no sense whatsoever.

That removed the “Unclosed group” error! Thx!
Don’t get any data into the DateTime item, most likely due to the \r\n. No matter what I do I either get an error in the expression, or no data :frowning:

Neither does the rant (to me) :rofl:

This may sound silly but would it be easier, maybe to change the firmware on your mqtt device (I assume it’s an ESP or arduino) so that it sends 2 messages ot that the 2 values are separated by something else than /r/n. Or chage the format to JSON.
It would then be VERY easy on OH
Changing firmware to change 2 character on string.Or change to JSON - 5 minutes
Upload firmware - 5 minutes
JS or JSONPATH transformation instead of REGEX for the two items in OH - 10 minutes
TOTAL 20 minutes

Total amount of time lost on REGEX so far: 4 days

In an ideal world that would be the solution. However the MQTT setup in the case is fixed and cannot be altered, so I’m stuck with what I have.
For those skilled in Rexex it’s probably very easy, but as I’ve said it’s too complicated for me to fix myself :confused:

You have the first one working to retrieve the value, I gather
The date part is fixed length so we can use a JS transform to retrieve it

Create a file called test.js in your conf.transform with the following content:

(function(i) {
    var dateString = i.slice(-28);
    var d = new Date(dateString);
    return d.getTime() / 1000;
})(input)

Your DateTime item becomes:

DateTime Power_Time { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:JS(test.js)]" }
  1. Your regex seems not to be the same as as posted from @rlkoshak.

  2. I think as your regex is embedded into a string you have to properly escape the backslash.

https://docs.openhab.org/addons/transformations/regex/readme.html#differences-to-plain-regex

  1. As it seems that you want to match a multiline string you need something like
"REGEX(s/.+\\s.+(.*)/$1/g)"

Explain in detail in this post.

I did not test it but it should be something like

Number Power { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX((\\d*.\\d*).*)]" }
DateTime Power_Time { mqtt="<[mosquitto:Electricity/EL/Energy/Import/kWh:state:REGEX(s/.*\\s(.*)/$1/g)]" }

Thanks for your links and suggestion. I’m out travelling but will try it when I get back. If the Regex method doesn’t work for me, then I’ll try to persuade the developer into changing his format to JSON instead :slight_smile:

@Josar The power item is working as it should :). Thank you for that. Unfortunately the DateTime item results in '2018-05-15 21:42:11.745 [WARN ] [b.core.events.EventPublisherDelegate] - given new state is NULL, couldn’t post update for ‘Power_Time’ :frowning:. Soooo close. Any idea to why?

I was a little worried about this. 2018-05-09T20:55:00.0000000Z is a little fishy. After going down the rabbit hole, I didn’t get that far, while that is ISO 8601 format, from the Wikipedia article:

However, the number of decimal places needs to be agreed to by the communicating parties. For example, in Microsoft SQL Server, the precision of a decimal fraction is 3, i.e., “yyyy-mm-ddThh:mm:ss[.mmm]”.

I’m willing to bet that Joda or Date also only supports three decimal places. And if some of those zeros are supposed to denote the time zone offset, then it is missing the + sign and it is not valid ISO 8601 format.

But, however you look at it, I think you need to strip the . and the seven zeros off of the date which you should be able to do with something like:

REGEX(s/.*\\s(.*)\..*/$1/g) 

which will only match the date up to the ..

Thx @rlkoshak. This gives me:

2018-05-15 22:10:34.553 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model ‘Electricity.items’ has errors, therefore ignoring it: [14,34]: mismatched character ‘.’ expecting set null
[14,105]: mismatched input ‘.’ expecting RULE_STRING
[14,109]: missing ‘}’ at ‘1’
[14,110]: extraneous input ‘/’ expecting RULE_ID
[20,121]: mismatched character ‘’ expecting ‘"’

I feel really noobish…

You probably have to double escape the .

REGEX(s/.*\\s(.*)\\..*/$1/g)