Data out of a HTML with regex

dirk_sh · January 23, 2020, 1:12pm

Hello,

I have a question how to extract data out of a HTML file with regex. From my ESP8266 together with a BME680 I get the reading via a HTML file.

The output is like this:

<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1">

<meta http-equiv="refresh" content="30"></head>
<body><h1>BME680 Raum 1</h1>
<div><p>Temperatur: 
21.30
°C</p><p>Luftfeuchtigkeit: 
58.76
%</p><p>Luftdruck: 
969.59
hPa</p><p>Gaswiderstand: 
19.02
KOhms</p><p>Luftqualität: 
26
IAQ</p><p>Meeresspiegel: 
370.96
m</p></div>

</body></html>

Now I want to get the temperature value out of the file.

How can I manage to only the number after “Temeratur:” and not the rest of the numbers?

Thanks for any tips or help.

Cheers
Dirk

Platform information:

Hardware: Raspberry 3b
OS: openhabian
openHAB version: 2.5.1-2

rlkoshak · January 23, 2020, 4:52pm

Regex works a little differently on openHAB from normal Regex. in openHAB you have to match the entire String. Then you specify what part of the String you want returned using parens.

Knowing this, you can now go and use one of the many online regular expression testers like regex101.com to experiment and come up with the proper expression. It will look something like

.*Temperature:(\d+\.\d\d).*

The tricky part are those newlines which you need to figure out whether they are just \n or \r\n or whatever.