Regex in rule

sovapatr · August 24, 2017, 7:18pm

I’m trying to send a particular youtube video to a chromecast. I’ve got a exec thing running youtube-dl to get the correct web address that the chromecast will accept. My issue is that sometimes it returns the url along with a “WARNING: unable to extract uploader nickname” line at the end. I figured i could just use regex to only capture the first line, but no matter what I try it seems to be capturing the entire string. Is there something simple I’m missing? Still new to making rules.

My rule currently:

rule "Youtube Switch On"
when
  Item VR_SW_Youtube changed to ON
then
  var url = transform("REGEX", "(https.+)[\\s\\S]*", Exec_YT_Output.state.toString())
  CC_01_PlayURI.sendCommand(url)
  CC_01_Control.sendCommand(PLAY)
end

rlkoshak · August 24, 2017, 8:13pm

First, it is all but impossible without seeing EXACTLY what string you are trying to match against.

From looking at your REGEX it looks like it is matching: “(https plus one or more arbitrary characters) followed by zero or more white space characters”. Since regex is greedy it tries to find the largest segment that matches this which in this case will be the full string.

What you want is something that matches something like (I don’t use regex enough to type what this would look like but there are tons of google resources):

“(https plus one or more arbitrary characters except for spaces) followed by zero or more spaces followed by zero or more additional characters”

Having said that, it would probably be easier to just use a simple String.split() call than a REGEX.

If the URL and the warning are separated by a space use:

val url = Exec_YT_Output.state.toString.split(' ').get(0)

sovapatr · August 24, 2017, 8:36pm

Sorry I didn’t put the whole string in because it is long and messy and figured the intent was fairly straightforward with the regex. But it starts with a url containing https and then a newline that contains "WARNING: unable to extract uploader nickname"x2 appears some of the times.

So the string ends up as something like:

https://someurl.googlevideo.com/videoplayback?lots=of&different=options
WARNING: unable to extract uploader nickname
WARNING: unable to extract uploader nickname

I was using the \s not for space but rather any whitespace characters to handle newlines.

I read in the docs that the transform function places the regex inside a ^ and $. Which is why I put [\s\S]* at the end. Drop this into regex101.com and it seems like it should work. Which is why I wondered if I was just using the transform function wrong and not so much the actual regex expression.

I’ll try the split method. I just wasn’t entirely sure if that would throw an error in the cases where the extra lines weren’t present.

namraccr · August 24, 2017, 10:39pm

Another way to go is just take the whole string and check if it contains the "WARNING: " text. If so, strip it out and repeat.

namraccr · August 25, 2017, 1:20am

. matches everything.

5iver · August 25, 2017, 2:36am

I guess we’re in dotall mode then? Taking into account the anchors too, how about this:

var url = transform("REGEX", "(https[^\\s]+)[\\s\\S]*", Exec_YT_Output.state.toString)

namraccr · August 25, 2017, 2:39am

I was confused when I read this. I had to go and google “dotall”.

Do you know if this terrible behaviour is new from Java or if it was borrowed from one of the other regex variants out there?

5iver · August 25, 2017, 12:28pm

Confirmed… the transform service has dotall enabled (line 60).

github.com

eclipse/smarthome/blob/master/extensions/transform/org.eclipse.smarthome.transform.regex/src/main/java/org/eclipse/smarthome/transform/regex/internal/RegExTransformationService.java

/**
 * Copyright (c) 2014,2018 Contributors to the Eclipse Foundation
 *
 * See the NOTICE file(s) distributed with this work for additional
 * information regarding copyright ownership.
 *
 * This program and the accompanying materials are made available under the
 * terms of the Eclipse Public License 2.0 which is available at
 * http://www.eclipse.org/legal/epl-2.0
 *
 * SPDX-License-Identifier: EPL-2.0
 */
package org.eclipse.smarthome.transform.regex.internal;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.eclipse.smarthome.core.transform.TransformationException;
import org.eclipse.smarthome.core.transform.TransformationService;
import org.slf4j.Logger;

This file has been truncated. show original

sovapatr · August 25, 2017, 12:36pm

I updated my rule to use [^\s]+ inside the capture group. Haven’t had the warning today, but I think that should solve it. Thank you for that bit of insight. I use regex for work regularly and had never come across the dotall thing.