(Another) REGEX question

I love jsonpath, but unfortunately REGEX is required in a lot of my approaches and I really struggle with some of the (apparently) easiest tasks.

I would like to extract let’s say the Name of the following string:
"05.09.2019";"05.09.2019";"Vorname";"Name";"Strasse";"PLZ / Ort";"Tel. Nummer";"xxx";"yyy";"zzz";"0815";

How can I separate finding the “;” and ignore numbers and other characters (* would not work / /w would not either)?

If I use something like “;”, I can find the 10 matches on REGEX101.
But how to exctract the Name only (means: get everything between the “;” and the next occasion of the “;”?
Name can contain [a-zA-Z] as well as [0-9] and / or -

You mean you want to get the fourth element of a semicolon separated list of values?

In a rule, I would

val myList = sourceString.split(";")
val myTarget = myList.get(3)    // counts from 0

As you know, string ops like .split() are based on REGEX but I have no idea how you would rewrite as something for openHAB REGEX Transformation Service, which as you also know is not plain REGEX

Edit - bear in mind it all goes wrong if elements can contain semicolon character, inside the quotes.

1 Like

Oh, Lord!
I am an idiot!
Thank you very much for pointing out the (truly) obvious!
:smiley:

Well, you might need it as a transform for use in binding, rather than in a rule. I’d cheat and use a JS javascript transform using a similar method to the rule :wink:

.*;.*;.*;(.*);.*

or

[.*;]{3}(.*);.*

In words: Match any number of characters .* until you see a semicolon ;, repeat that three times {3}. Grab any characters after that point to the next semicolon (.*);. Then match the rest of the string .*.

The stuff in parens is what gets returned. If you don’t want the quotes returned:

[.*;]{3}"(.*)";.*

Be careful when learning about regex from other sites as openHAB is weird in how it handles regex. The pattern must match the whole String and only the stuff that matches in the first group (i.e. between the ( )) gets returned. This is not standard regex behavior.

NOTE: I don’t think ; is a special character in regex. If it is you may need to escape it with a \.

I played around with these, because my assumption was that this should work.
It does not, because the example above returns the entry in front of the last ; and does not care about the stuff before.
So this one returns the same result;:

.*;(.*);.*

One of the challenges of regular expressions is dealing with greedy evaluation and lazy evaluation. This is one of the cases where we need to use lazy evaluation instead of greedy evaluation.

.*?;.*?;.*?;.*?(.*?);.*

For whatever reason, I can’t get the {3} to work.

The above works in a regex tester. The ? turns the match from greedy (all the way to the last ;) to lazy (stops at the first ;).

That really works in REGEX101 - Thanks a lot, Rich.
but shouldn’t it be like this to get “Name”?
Why the last .*? before the bracket?
.*?;.*?;.*?;(.*?);.*
(to start right after the 3rd semi colon?

Yes, the ? makes the match lazy instead of greedy. so .*?;.*;.*; advances to right after the third semicolon. The (.*?); tells the OH regex transformation to return everything after that third ; up to the next ;. Again, the ? makes it a lazy evaluation which prevents it from gobbling up all the semicolons until the last one.

Another approach is to keep it with greedy evaluation and work from the back of the string forward.

.*(.*);.*;.*;.*;.*;.*;.*;.*;

Because the greedy evaluation will match to the last found ;, we tell it we want the 8th field from the end with that regex.

Most of the time you will be dealing with more unique delimiters when using REGEX so the greedyness isn’t a problem. The fact that each field has the same delimiter makes it a problem here.

To overcome the greedy problem you may use [^;] which means “match everything except a ;”. So
[^;]*;[^;]*;[^;]*;([^;]*);.*
should do the job.
Maybe
s/([^;]*;){3}([^;]*);.*/$2/
works as well. At least in Regex101.