Parsing a HTTP page..... How 2?

Windrad · January 8, 2017, 3:14pm

I like to have a value from a internetpage. The downloades html is 60K file which has to be parsed for a certain position, then a value some lines futher down has to become an item in Openhab2.
I read the wiki of the HTTP Binding and some other stuff I found on the net, but I don’t get a clue how to parse the html to get the specific value out.
The html contains some parts similiar to that below. The word “Tihange” makes the part unique I like to parse out.

        var canvas = document.getElementById('Tihange');
        var data = {
        labels: ["TDRM4", "TDRM10", , , , ],
        datasets: [
            {
                label: string_current,
                backgroundColor: "rgba(153,204,255,1.0)",
                borderColor: "rgba(153,204,255,1)",
                borderWidth: 2,
                hoverBackgroundColor: "rgba(153,204,255,1)",
                hoverBorderColor: "rgba(153,204,255,1)",
                data: [0.191, 0.197, ],
            },
            {
                label: string_max,
                backgroundColor: "rgba(153,255,0,0.7)",
                borderColor: "rgba(153,255,0,1)",
                borderWidth: 2,
                hoverBackgroundColor: "rgba(153,255,0,0.4)",
                hoverBorderColor: "rgba(153,255,0,1)",
                data: [0.224, 0.217, ],
            },

I like to have the 0.224 behind the “data:[” to be in an Item of Openhab2.
Is there anybody out there who can code that or point me in the right direction?
I know, this question is a bit to much, but maybe someone has something similar allready in his/her portfolio?
Thanks,
Ingo

rlkoshak · January 9, 2017, 6:58pm

You do indeed use the HTTP binding to pull down the full text of the web page. From there it becomes a lot more complicated because you are not parsing something that is pure JSON or XML. So your best bet is to use regular expressions and the REGEX transformation service.

Regular expressions are hard to get right so you will probably want to read up on how they work and experiment. There are online regex tester where you can paste in the text of your web page and write a regular expression and it will tell you what the expression returns.

It will likely be something along the lines of:

REGEX(.*data: \[.*data: \[(.*),.*)

What the above says is to return all the characters between “[” and “,” that occur after the second “data: [”.

Windrad · January 9, 2017, 10:17pm

Thanks Rich, I will give it a try the next days. Some months ago I visited a linux course where we got a “short impression” of regular expresions. So I should be able to try
Up to now I was able to avoid regular expressions. These times seem to be over…