markusth
(Markus Thomanek)
April 30, 2021, 7:13am
1
I try to scrape covid-19 numbers from the RKI webpage here RKI - Coronavirus SARS-CoV-2 - COVID-19: Fallzahlen in Deutschland und weltweit which doesn’t work as expected. When I test my expression here regex101: build, test, and debug regex
expression to test: /Gesamt<\/strong><\/td><td class="right" colspan="1" rowspan="1"><strong>([\d\.]+)/gm
I get the result: 1 match, 1 group
Which is what I do want. I tried several alternatives to get this transform working in a profile setting in the UI in OH3, this is what I entered in the “Profile Configuration - Regular Expression” field:
Gesamt<\\/strong><\\/td><td class=\"right\" colspan=\"1\" rowspan=\"1\"><strong>([\\d\\.]+)
Gesamt<\\/strong><\\/td><td class="right" colspan="1" rowspan="1"><strong>([\\d\\.]+)
Gesamt<\\\/strong><\\\/td><td class=\"right\" colspan=\"1\" rowspan=\"1\"><strong>([\\\d\\\.]+)
Gesamt<\/strong><\/td><td class="right" colspan="1" rowspan="1"><strong>([\d\.]+)
options with leading “/” and trailing “/gm” I tried as well. Unfortunately none of them worked. The only thing I got working is default Profile with no transformation, which give me the whole HTML result. I wonder what I might do wrong and hope someone here is able to guide me.
Thanks in advance
Markus
glhopital
(Gaël L'hopital)
April 30, 2021, 7:59am
2
Have you seen that it already exists a binding for this here ?
markusth
(Markus Thomanek)
April 30, 2021, 8:46am
3
Yes, I did and already tried it. Unfortunately it doesn’t fit my needs. Besides that fact, I do have other usecases where working regex profile transformation is needed. Just picked that one because of public availability of the source.
Nevertheless thanks for mentioning!
rossko57
(Rossko57)
April 30, 2021, 9:41am
4
openHAB REGEX Transformation Service is not fully featured regex. An obvious limitation, because of the uses it is intended for (e.g. manipulating a single Item state), it can only return one match - no arrays or similar.
markusth
(Markus Thomanek)
April 30, 2021, 11:11am
5
Does this mean my match need to be my capture and I need to rewrite my expression to fit these requirement? Because currently I do have only one number value captured, but it is different from the match.
s0170071
(S0170071)
April 30, 2021, 1:57pm
7
I grab most number from a json source, except one, that is RegEx
My Corona set up is:
.things file:
Thing http:url:lubu "Lubu CoVid-19" [baseURL="https://www.ludwigsburg.de/start/rathaus+und+service/", refresh="600", timeout="3000"] {
Channels:
Type string : Channel_Corona_S_Ludwigsburg_Incidence "S LB 7d [%s]" [ stateExtension="corona+7-tage-inzidenz.html", stateTransformation="REGEX:.*?<br>Stadt Ludwigsburg ([+-]?([0-9]*[,])?[0-9]+)?.*"]
}
Thing http:url:rki "RKI CoVid-19" [baseURL="https://api.corona-zahlen.org/", commandMethod="GET", delay="500", refresh="600", timeout="1000"] {
Channels:
Type number : Channel_Corona_LK_Ludwigsburg_Incidence "LK LB 7d [%.1f]" [ stateExtension="districts", stateTransformation="JSONPATH:$.data.08118.weekIncidence"]
Type number : Channel_Corona_Berlin_Incidence "ST Berlin Treptow 7d [%.1f]" [ stateExtension="districts", stateTransformation="JSONPATH:$.data.11009.weekIncidence"]
Type number : Channel_Corona_SK_Saarbruecken_Incidence "LK SB 7d [%.1f]" [ stateExtension="districts", stateTransformation="JSONPATH:$.data.10041.weekIncidence"]
Type number : Channel_Corona_S_Weiden_Incidence "Stadt Weiden 7d [%.1f]" [ stateExtension="districts", stateTransformation="JSONPATH:$.data.09363.weekIncidence"]
Type number : Channel_Corona_Deutschland_Incidence "Deutschland 7d [%.1f]" [ stateExtension="germany", stateTransformation="JSONPATH:$.weekIncidence"]
Type number : Channel_Corona_Deutschland_R "Deutschland R [%.2f]" [ stateExtension="germany", stateTransformation="JSONPATH:$.r.value"]
}
.items file
Group gCorona (gHaus)
String Corona_S_Ludwigsburg_Incidence "S Ludwigsburg 7d [%s]" <line> (gCorona) [ "Corona","Measurement" ] {channel="http:url:lubu:Channel_Corona_S_Ludwigsburg_Incidence", expire="12h"}
Number Corona_LK_Ludwigsburg_Incidence "LK Ludwigsburg 7d [%.1f]" <line> (gCorona) [ "Corona","Measurement" ] {channel="http:url:rki:Channel_Corona_LK_Ludwigsburg_Incidence", expire="12h"}
Number Corona_Berlin_Incidence "Berlin Treptow 7d [%.1f]" <line> (gCorona) [ "Corona","Measurement" ] {channel="http:url:rki:Channel_Corona_Berlin_Incidence", expire="12h"}
Number Corona_SK_Saarbruecken_Incidence "Saarbrücken 7d [%.1f]" <line> (gCorona) [ "Corona","Measurement" ] {channel="http:url:rki:Channel_Corona_SK_Saarbruecken_Incidence", expire="12h"}
Number Corona_S_Weiden_Incidence "Weiden 7d [%.1f]" <line> (gCorona) [ "Corona","Measurement" ] {channel="http:url:rki:Channel_Corona_S_Weiden_Incidence", expire="12h"}
Number Corona_Deutschland_Incidence "Deutschland 7d [%.1f]" <line> (gCorona) [ "Corona","Measurement" ] {channel="http:url:rki:Channel_Corona_Deutschland_Incidence", expire="12h"}
Number Corona_Deutschland_R "Deutschland R [%.1f]" <chart> (gCorona) [ "Corona","Measurement" ] {channel="http:url:rki:Channel_Corona_Deutschland_R", expire="12h"}
.sitemap file
sitemap corona label="Corona" {
Default item= gCorona
}
1 Like