Extract value from HTML

Hi,
I already wrote a lot of topics but do not get it working like i want.
I have a local webpage containing some HTML Text
Example:

 <div>
            <tr >
        <td>502000026870-1 </td>
        <td> 30 W </td>
        <td rowspan=4 style='vertical-align: middle;'> 50.0 Hz </td>
        <td> A: 223 V </td>
        <td rowspan=4 style='vertical-align: middle;'> 23 &#176;C </td>
        <td rowspan=4 style='vertical-align: middle;'> 2021-08-18 09:35:45
 </td>
      </tr>
            <tr >
        <td>502000026870-2 </td>
        <td> 30 W </td>
        <td> B: 223 V </td>
      </tr>
            <tr >
        <td>502000026870-3 </td>
        <td> 29 W </td>
        <td> C: 223 V </td>
      </tr>
            <tr >
        <td>502000026870-4 </td>
        <td> 2 W </td>
        <td>- </td>
      </tr>
          </div>
        </tbody>
  </table>
</div>            </div>
          </div>
    	</article>
      </div>
    </section>

I am interested in getting the Values in W contained.
Using https://regex101.com/ i found that the REGEX .\d+.W.</td> should match my values.

I’m using OH3 and have created an HTTP thing

I have created a Channel with a Transformation

I have installed REGEX and JS Transformations and in the Transformation folder i have wrote this :

(function(i) { 
    var re = new RegExp("<td>.\d+.W.<\/td></td>");
    var out = i.match(re)[1]; 
    return parseFloat(out.replace(',', '.')); 
})(input)

But i do net get the values i need, the error is:
Executing transformation ChannelStateTransformation{pattern=‘sol.js’, serviceName=‘JS’} failed: An error occurred while executing script. TypeError: Cannot get property “1” of null in at line number 3

I understand the error means that Regex do not find a value, but i’m not able to solve it.

Could you help me?

First, the transformations will return a single value. You will need a separate Channel with a separate transformation for each value you want to extract.

The JS transformation and the REGEX transformations are completely separate from each other. Since you are using REGEX, just use the REGEX transformation directly and forget the JS.

One thing that is unique to openHAB’s REGEX is that the expression must match the entire String. Then transformation will then return the first matching group.

So to get the 30 W value after 502000026870-1 you’d use REGEX:.*502000026870-1.*(\d+)\sW.*

The group is defined by the ( ). So that expression matches everything using .* up to “502000026870-1”, then matches everything up to the first digit. The + matches one or more digits. The \d+ is in () so just the number, in this case 30, will be returned. The \s matches the space and the W matches the units and the .* matches the rest of the web page.

In general an OH REGEX will start with .*, then have something unique to identify the start of what needs to be returned, parens to capture what needs to be returned, and then something unique to define the end of what needs to be returned.

1 Like

Before you do any of that, you might like first to add a temporary test channel of string type with no transformations, link to a test Item, and make sure you get the webpage you expect.

Thank you, I will try. I already have a dummy item and i get the entire HTML code in that.
But where should i put the

REGEX:.*502000026870-1.*(\d+)\sW.*

Directly in the Channel? like this:

Did you try it? Yes, that’s where it goes.

Yes it works, Thank you

Small detail if someone reads this later, there was a ? Missing, it should be:
REGEX:.502000026870-1.?(\d+)\sW.*`