Sitemap/JSON error

The source data is okay. A label on the sitemap is showing the block as per OP.
The regexp works in regex101 with the data copied from my sitemap output, for the source.
The only difference is I’m placing the regex in a rule and adding an escape “” for the greedy whitespace “\s+”
Ive used linux and programmed in it for 30 years mate, including regex .
It is OH that is the issue (or my use of it) and the fact that the functionality of OH is through complimentary code - so they may not be a 1:1 match

Again, you are using openHAB REGEX transformation service. Not REGEX.

So, that’s been processed by the sitemap and then by your browser and then copy pasted into another browser for further processing. It is not your raw source. I’m still afeared of hidden control characters.

yes, sorry, I was just editing my post before yours to concur with you.
I’ve read the OH regex docs , I shall re-read

EDIT: The OH docs even refer to the online validator - the very one I used with success. So it has to be the source, as you say possibly containing ctrl chars. I will rework the regex pattern.

I wouldn’t say I like saying it. I’m forced to say it often because for some reason OH decided to make REGEX work differently from normal so you will find lots of users say things like “it works on regex101.com but not in openHAB.”

Usually the first three or four lines of the exception is sufficient.

I don’t use JSONPATH that much but what you really want is to extract the value 20, not create a new JSON string { 'Motor current below expectation' : 20 }. When you use the square brackets in the way you are, you get back the latter. to get just the value you would use $.Motor current below expectation. You might need to escape the spaces.

For details see the JsonPath Transform docs and in particular click on the links.

No, it contains a JSON string that encodes two different values, “Motor current below expectation” and “Motor voltage”. I like rossko57’s term. That is unconventional in openHAB. Typically you would be splitting those two values into two separate Items at the MQTT Channel level, not leaving the values embedded into a single Item and splitting the values out at the Sitemap level. That is the part that is unconventional.

But unconventional doesn’t mean won’t work. It just unusual and ultimately if you want to use these values anywhere else in openHAB, it will be significantly more work in the long run because you will have to write rules to split the values out of the one Item each and every time.

As rossko57 indicated. REGEX in OH is slightly different from normal REGEX. At least when used from the transformation service. In that case your expression must match the entire string and only the first group is what gets returned by the transformation. And by entire String I mean the whole thing, not just the line. If your String has multiple lines you must match all of them.

Thanks muchly Ritch. The square brackets are needed in JSON to reference a key with spaces in it.
However, I have moved away from using JSON to parse my data because it contains control chars , certainly newlines.

Notwithstanding design decisions as to what where things “should” go, let us focus on the very simple regexp - please - and understand if we can please, why it is not working. It shouldnt matter where the expr is , so lets just use this (bad) example eh, once working I will correct the design.

So I moved to REGEX with a very simple pattern…
(?:“Motor current below expectation”[^a-z]*?)(true|false)

Meaning:

  • Non-capturing group (?:“Motor current below expectation”[^a-z]*?)
  • “Motor current below expectation” matches the characters “Motor current below expectation” literally (case sensitive)
  • Match a single character not present in the list below [^a-z]*?
  • *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
  • a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)
  • 1st Capturing Group (true|false)
  • 1st Alternative true
  • true matches the characters true literally (case sensitive)
  • 2nd Alternative false
  • false matches the characters false literally (case sensitive)

Using the text in the OP , and including newlines and tabs, this expression correctly parses the text.

So the question is, and help needed, why is the regexp not working in OH, because the capture group 1 correctly yields “false”. Whilst the regexpr can be improved to better guard against bad data, it is very simple and works , question is why not in OH? What exactly is OH refusing to parse in the very simple regexp?.

p.s. Ritch,
I do not understand what you mean by:

In that case your expression must match the entire string and only the first group is what gets returned by the transformation. And by entire String I mean the whole thing, not just the line. If your String has multiple lines you must match all of them
`
What is the point of a regexp that must match the entire data, rather than just a string. And what is the “first group” in OH terms?

Regex appears to much different than how it should work, so I will go back to JSON, I’m almost there , because it recognised the structure , from openhab.log

2020-01-13 16:40:56.348 [WARN ] [ofiles.JSonPathTransformationProfile] - Could not transform state '{
        "Motor current below expectation":      false,
        "Motor current always high":    false,
        "Motor taking too long":        false,
        "discrepancy between air and pipe sensors":     false,
        "air sensor out of expected range":     false,
        "pipe sensor out of expected range":    false,
        "low power mode is enabled":    false,
        "no target temperature has been set by host":   false,
        "valve may be sticking":        false,
        "valve exercise was successful":        false,
        "valve exercise was unsuccessful":      false,
        "driver micro has suffered a watchdog reset and needs data refresh":    false,
        "driver micro has suffered a noise reset and needs data refresh":       false,
        "battery voltage has fallen below 2p2V and valve has been opened":      false,
        "request for heat messaging is enabled":        false,
        "request for heat":     false
}' with function '$.<jsonPath>' and format '<valueFormat>'

Reading this, Problem with spaces in json keys , it appears I need a rule to strip out newlines etc. Blimey, hard work this OH. I will go back to regexpr

ok, so I have moved the transformation to the channel level:

	String i_TRV_Lounge_Diag1 { 
		channel="mqtt:topic:b_MQTT_Broker:t_TRV_Lounge:c_Diagnostics" [profile="transform:REGEX", function=".*?\"Motor current below expectation\"[^a-z]*?(true|false)"] 
		}

and linked this item to a label on the sitemap, and the label displays:

Motor current below expectation.

Nearly there, at least it recognises a portion, albeit not the one I want. I cannot find the full syntax reference doc for OH REGEX (being as it isnt the same as “proper” regex )?
Appears to me that it is returning the matched string verbatim, which is not what I want as I have included a group to match either true or false, the latter being what I want to obtain.

As I vaguely understand it, the main differences in the transformation services is that they are required to return a single string. For example, as a label or as an Item update it’s no use returning an array, or a JSON object, or a list.
That constraint limits the things that are sensible to do … example, matching just “motor” in your data to return a list of multiple matches

I repeat my advice to take baby steps, do your tests in a rule where you can supply faked test data, examine individual character by character if you have to, etc.
Sticking it in a binding or profile transform cripples your opportunities to investigate.

Because, like rossko57 and I have said, and I quote:

In that case your expression must match the entire string and only the first group is what gets returned by the transformation. And by entire String I mean the whole thing, not just the line. If your String has multiple lines you must match all of them.

Your expression only matches one line from the String, not the whole String. You need something to match the rest of the string. Often a .* is sufficient. I would probably use something like:

.*Motor current below expectation\".*([true|false]).*

Obviously I didn’t test the above, just typed it in. It’s for illustrative purposes.

Create an issue on github and ask the devs. All I can say is how it does work in OH. I’m certain they have very specific and technical or usability reasons why it is implemented this way.

The first group is the first set of ( ). It only ever makes sense, in my experience, to have one set of parens in the REGEX transformation. Put them around what you want returned. You have two sets of parens so it’s returning what the first one captures.

This is waaaay over the top and time consuming to get working, spent all day on this, taking small steps, but nothing I do works. So, I’m going to back base 1 and just linking a sitemap label to the diagnostic item, it displays the full [formatted I suspect] JSON but at least the values are there.
Thank you for trying.