Hi there,
after quite a bit of searching I finally came across a solution to extract values from a website with javascript generated content being inserted AFTER loading the site. As an owner of a synology with docker-capabilities, I installed a web-scraper that allows me to render websites and open its content after a predefined loading-time. This way I was able to receive an HTML-file with the javascript generated content through the http-binding. I could now extract data from my wifi-enabled “Ofen-Innovativ” system, which does not come with an API:
Docker-Package: Splash (v3.5)
https://hub.docker.com/r/scrapinghub/splash
http-path to Splash website in my case: 192.168.1.8:28080
Items-File (including regex):
String EG_Ofen_Contact "Türkontakt [%s]" <contact> {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*ledTkUdaj\">(.*?)</span>.*)]"}
String EG_Ofen_Abbranddauer "Abbranddauer [%s]" <time> {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*casUdaj\">(.*?)</span>.*)]"}
Number EG_Ofen_Luftzufuhr "Luftzufuhr [%d %%]" <wind> {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*klapkaUdaj\">(.*?)%</span>.*)]"}
String EG_Ofen_Heizfehler "Alarm" <error> {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*alarmUdaj\">(.*?)<br></span>.*)]"}
In the last item the regex-data being extracted is in between
alarmUdaj">
and
<br></span>
All the other items work the same way.
I hope this is of use to anyone trying something similar.
Regards,
Nils