Problem with HTTP binding and XPATH parsing HTML

Hi All,

I am trying to get the water temperature from this website https://www.nid.bayern.de/wassertemperatur//isar/gmund_tegernsee-18201303 by using the HTTPBinding and a XPATH transformation to retrieve the value from the first row in the table at the bottom of the webpage.

I am using the follwing syntax in my items file:

Number num_WasserTemp_Tegernsee “Wassertemperatur Tegernsee[%.1f °C]” (gPersistance) [ “CurrentTemperature” ] { http="<[https://www.nid.bayern.de/wassertemperatur//isar/gmund_tegernsee-18201303:300000:XPATH(/html/body/div/div[2]/div[3]/div/div[2]/table/tbody/tr[1]/td[2]/text())]" }

(XPATH taken from Firefox DevTools, also tried the //table/tbody… syntax).

Unfortunaly I get errors in openhab.log:

org.openhab.core.transform.TransformationException: transformation throws exceptions

I already tried to validate the xpath in this page https://www.freeformatter.com/xpath-tester.html - but it does not seem to accept the HTML input from the website (which could be another hint why this is failing)

Can somebody help me with this issue?

Many thanks
Jesko

Hey Jesko,

Besides the fact, that this isn´t probably recommended (but mostly sufficent for an old-school bavarian website, I think :stuck_out_tongue:) , you could use RegEx in your scenario. As i´m not really experienced with this type of transformation, it might be not the nicest way doing this, but maybe this helps you finding a starting point:

(?isU)<table>|(?:<td\s\sclass="center">)(.*)(?:<\/td>)

You can check your expression here: https://regex101.com/

Have a nice day!

1 Like
1 Like

Hi - the website does not require authentication - I will try the regex approach, but the syntax is from hell :slight_smile:

Thanks Rainer,

that makes sense, but I am struggling with the syntax of REGEX - is this correct:

Number num_WasserTemp_Tegernsee “Wassertemperatur Tegernsee[%.1f °C]” (gPersistance) [ “CurrentTemperature” ] { http="<[https://www.nid.bayern.de/wassertemperatur//isar/gmund_tegernsee-18201303:300000:REGEX((?isU)|(?:<td\s\sclass=“center”>)(.*)(?:</td>)) ]" }

The point was that HTML is not XML and as a consequence XPATH will not work for webpage scraping.

1 Like

Im not using RegEx in my setup right now, but based on the documentation, your item definition should be correct. If its not working, try to escape the " inside the expression with a slash. If youre using it in a rule, it would be something like:

transform("REGEX", "(?isU)<table>|(?:<td\s\sclass=\"center\">)(.*)(?:<\/td>)", num_WasserTemp_Tegernsee)

You might have to transform the result into a number, so that your item can accept it.

Thanks Rainer, I dont get the syntax right in the items file:

Number num_WasserTemp_Tegernsee “Wassertemperatur Tegernsee[%.1f °C]” (gPersistance) [ “CurrentTemperature” ] { http="<[https://www.nid.bayern.de/wassertemperatur//issar/gmund_tegernsee-18201303:300000:REGEX((?isU)|(?:<td\s\sclass=“center”>)(.*)(?:)) ]" }

This throws the following error:

2020-09-24 11:45:43.997 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model ‘external_WebData.items’ has errors, therefore ignoring it: [4,118]: mismatched character ‘s’ expecting set null
[4,223]: mismatched input ‘’ expecting RULE_STRING

Have you already tried escaping the "? I´ll creating a dummy item later and will check it for myself, if this doesnt help.

Number num_WasserTemp_Tegernsee "Wassertemperatur Tegernsee[%.1f °C]" (gPersistance) [ "CurrentTemperature" ] { http="<[https://www.nid.bayern.de/wassertemperatur//issar/gmund_tegernsee-18201303:300000:REGEX((?isU)|(?:<td\s\sclass=\"center\">)(?:<\/td>))]"}

I have added a \ in front of each ? but its not working

2020-09-24 12:20:16.873 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model ‘external_WebData.items’ has errors, therefore ignoring it: [4,118]: mismatched character ‘?’ expecting set null
[4,211]: mismatched input ‘isU’ expecting RULE_STRING
[4,234]: mismatched character ‘?’ expecting set null
[4,262]: mismatched character ‘’ expecting ‘"’

Hmm, dont know… there are so much chars to escape in OH due to its string conversion, that it messes up more complex rules… so for a quick success, you could use this - it works but would break very easy, if anything on the current site-structure will change.

Number watertemp "Wassertemperatur Tegernsee [%s]" { http="<[https://www.nid.bayern.de/wassertemperatur/issar/gmund_tegernsee-18201303:10000:REGEX(.*?<td\\s\\sclass=\"center\">(.*?)</td>(.*))]" }

Good luck

One thing that is important to understand about the REGEX transform is that the pattern needs to match the entire String, in this case the entire HTML document. The first matching group (i.e. first set of parens) is what gets returned if the expression matches the whole String.