HTTP extract JS-generated content

Hi there,

after quite a bit of searching I finally came across a solution to extract values from a website with javascript generated content being inserted AFTER loading the site. As an owner of a synology with docker-capabilities, I installed a web-scraper that allows me to render websites and open its content after a predefined loading-time. This way I was able to receive an HTML-file with the javascript generated content through the http-binding. I could now extract data from my wifi-enabled “Ofen-Innovativ” system, which does not come with an API:

Docker-Package: Splash (v3.5)

http-path to Splash website in my case:

Items-File (including regex):

String    EG_Ofen_Contact                      "Türkontakt [%s]"                       <contact>                                          {http="<[*ledTkUdaj\">(.*?)</span>.*)]"}
String    EG_Ofen_Abbranddauer                 "Abbranddauer [%s]"                     <time>                                             {http="<[*casUdaj\">(.*?)</span>.*)]"}
Number    EG_Ofen_Luftzufuhr                   "Luftzufuhr  [%d %%]"                   <wind>                                             {http="<[*klapkaUdaj\">(.*?)%</span>.*)]"}
String    EG_Ofen_Heizfehler                   "Alarm"                                 <error>                                            {http="<[*alarmUdaj\">(.*?)<br></span>.*)]"}

In the last item the regex-data being extracted is in between




All the other items work the same way.

I hope this is of use to anyone trying something similar.


This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.