Using Web Scraping (HTML Wrapper/XPATH) on a Website with authentication

Hey OH-Forum,
So a website of a friend lists some data I want to parse into my OH System.
I’m not a web developer and I’m not really into HTML, but is it possible to parse data with XPath from the HTML Wrapper into a OH variable?
Or should I use a Python script?
the website also has a authentication, so I would need to login everytime my User isn’t authenticated anymore.
There’s no API or something similiar sadly.
so is it even possible?
Thanks in advantage

Only if the HTML is correctly formatted as XML. For example, HTML lets you use <br> to insert a line break. But when used like that there is no closing tag </br> so XPath would not be able to parse it.

There are other features of HTML that may make it not match valid XML syntax which will break XPath as well.

But depending on the formatting the REGEX transform might work.

Does it use basic auth? If so you can embed your username and password in the URL to log in:

NOTE: You will want to be sure that the website uses HTTPS or else you will be passing your username and password in the clear.

We don’t have enough information to say one way or the other really.

This option is getting removed from more and more browsers…Chrome recently removed it…other are following.

The browser is a web client. The HTTP Binding and sendHttpRequest Actions are also clients and do not go through browsers. So unless and until that feature is removed from the server hosting the webpage or it is removed from the OH HTTP binding and sendHttpRequest Actions then it is irrelevant what Chrome or Firefox or any other browser allows or doesn’t allow.

Good to know…as I got bit by using a username/password url in HABPanel in an image carousel widget.

I guess the difference is that HABPanel is using the browser as the client and the HHTP binding acts as it’s own client.

Sorry if I caused any confusion.

That makes some sense since HABPanel (all the UIs really including the phone apps) are running in a browser so it is the browser that is calling the URL.

Nothing to apologize for. There is a lot of keep track of.