I have an inverter that has no accessible API, therefore I want to scrape the webpage to gain the energy production information.
In the html code you find something like this:
<tr> <!-- 历史发电量 --> <th scope="row">Lifetime generation</th> <td>64.2 kWh </td> </tr> <tr> <!-- 最近一次系统功率 --> <th scope="row">Last System Power</th> <td>847 W </td> </tr> <tr> <!-- 系统当天累计发电量 --> <th scope="row">Generation of Current Day</th> <td>0.69 kWh </td> </tr>
So, I want to extract the value of the current day.
It’s my first time trying regular expression, therefore I tried this:
.*Generation of Current Day<\/th>\n\s*<td>[+-]?\d+((\.|\,)\d+)? kWh <\/td>.*
If I test that on https://regex101.com/
<th scope="row">Generation of Current Day</th> <td>0.69 kWh </td>
Of course, I have to reduce the output to the number value only, any hints?
I tried to use that with OH3, to get a first result there:
- added a http thing and channel that already receives the complete html code as a string.
- installed the REGEX transformation
- Configured the channel:
The result is that the item is empty.
- How do I configure the channel that I will receive the right value?
- Is there any optimization for the regular expression?