HTTP extract JS-generated content

niri · May 13, 2021, 8:25am

Hi there,

after quite a bit of searching I finally came across a solution to extract values from a website with javascript generated content being inserted AFTER loading the site. As an owner of a synology with docker-capabilities, I installed a web-scraper that allows me to render websites and open its content after a predefined loading-time. This way I was able to receive an HTML-file with the javascript generated content through the http-binding. I could now extract data from my wifi-enabled “Ofen-Innovativ” system, which does not come with an API:

Docker-Package: Splash (v3.5)
https://hub.docker.com/r/scrapinghub/splash

http-path to Splash website in my case: 192.168.1.8:28080

Items-File (including regex):

String    EG_Ofen_Contact                      "Türkontakt [%s]"                       <contact>                                          {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*ledTkUdaj\">(.*?)</span>.*)]"}
String    EG_Ofen_Abbranddauer                 "Abbranddauer [%s]"                     <time>                                             {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*casUdaj\">(.*?)</span>.*)]"}
Number    EG_Ofen_Luftzufuhr                   "Luftzufuhr  [%d %%]"                   <wind>                                             {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*klapkaUdaj\">(.*?)%</span>.*)]"}
String    EG_Ofen_Heizfehler                   "Alarm"                                 <error>                                            {http="<[http://192.168.1.8:28080/render.html?url=http://192.168.6.7/&timeout=20&wait=8:60000:REGEX(.*alarmUdaj\">(.*?)<br></span>.*)]"}

In the last item the regex-data being extracted is in between

alarmUdaj">

and

<br></span>

All the other items work the same way.

I hope this is of use to anyone trying something similar.

Regards,
Nils

system · June 24, 2021, 12:26am

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.