Simple but extensible Voice Control with JavaScript

JanMattner · August 24, 2021, 9:12pm

UPDATE: Now available on GitHub: https://github.com/JanMattner/voice-control-openhab

I’d like to share my current state of my personal simple voice control.

My goals:

Using voice instead of some UI (e.g. Main UI) seems more intuitive, easier and quicker to me
I do not want any smart AI - just a way to trigger something via voice instead of a button (i.e. I know how my items are named, I do not need an AI to interpret natural language and extract semantics)

My setup:

Raspberry Pi running OpenHabian
Android App with the Speech-To-Text feature (by Google)

I have seen the Java Built-In interpreter and thought: well, that’s nice, but I’d like to have an easy way to add custom rules without building the Java code.
That’s why I just took those ideas and implemented it in JavaScript (the out of the box ES5).

So that’s the outcome:

Released under the terms of the MIT license:

MIT License

Copyright (c) 2021 Jan Mattner

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Content of /automation/lib/javascript/personal/ruleBasedInterpreter.js:

// HOW TO load this file from a script/rule created in Main UI:
// var OPENHAB_CONF = Java.type('java.lang.System').getenv('OPENHAB_CONF');
// load(OPENHAB_CONF + '/automation/lib/javascript/personal/ruleBasedInterpreter.js');

'use strict';

(function(context) {
    'use strict';
    var logger = Java.type('org.slf4j.LoggerFactory').getLogger('org.openhab.core.automation.ruleBasedInterpreter');

    // ***
    // DATA
    // ***
    var expressionTypes = {
        SEQUENCE: "sequence",
        COMMAND: "command",
        ALTERNATIVE: "alternative",
        OPTIONAL: "optional",
        ITEMLABEL: "itemlabel"
    };

    var rules = [];

    // ***
    // FUNCTIONS
    // ***

    function alt(params) {
        return {
            expressionType: expressionTypes.ALTERNATIVE,
            value: params || []
        }
    }

    function seq(params) {
        return {
            expressionType: expressionTypes.SEQUENCE,
            value: params || []
        }
    }

    function opt(expression) {
        return {
            expressionType: expressionTypes.OPTIONAL,
            value: expression
        }
    }

    function cmd(expression, command) {
        return {
            expressionType: expressionTypes.COMMAND,
            value: expression,
            command: command
        }
    }

    function itemLabel() {
        return {
            expressionType: expressionTypes.ITEMLABEL
        }
    }

    function addRule(expression, executeFunction) {
        rules.push({
            expression: expression,
            executeFunction: executeFunction
        });
    }

    function interpretUtterance(utterance) {
        var normalizedUtterance = normalizeUtterance(utterance);
        var tokens = tokenizeUtterance(normalizedUtterance);

        logger.debug("input normalized utterance: " + normalizedUtterance);
        logger.debug("input tokens: " + stringify(tokens));

        for (var index = 0; index < rules.length; index++) {
            logger.debug("check rule " + index);
            var rule = rules[index];
            logger.debug(stringify(rule));
            var result = evaluateExpression(rule.expression, tokens.slice());
            if (result.success) {
                var executeFunction = result.executeFunction || rule.executeFunction;
                if (!executeFunction) {
                    logger.debug("rule matched, but no function to execute found, continue");
                    continue;
                }

                executeFunction(result.executeParameter);
                break;
            }
        }        
    }

    function evaluateExpression(expression, tokens) {
        if (tokens.length < 1) {
            return createEvaluationResult(true, tokens, null);
        }

        if (typeof(expression) == "string") {
            return evaluateStringExpression(expression, tokens);
        }

        switch (expression.expressionType) {
            case expressionTypes.SEQUENCE:
                return evaluateSequence(expression, tokens);
            case expressionTypes.ALTERNATIVE:
                return evaluateAlternative(expression, tokens);
            case expressionTypes.OPTIONAL:
                return evaluateOptional(expression, tokens);
            case expressionTypes.COMMAND:
                return evaluateCommand(expression, tokens);
            case expressionTypes.ITEMLABEL:
                return evaluateItemLabel(tokens);
            default:
                return createEvaluationResult(false, tokens, null, null);
        }
    }

    /**
     * 
     * @param {boolean} success - if evaluation was successful or not
     * @param {string[]} remainingTokens 
     * @param {function} executeFunction - the function to execute in the end 
     * @param {object} executeParameter - the parameter inserted in the executeFunction. Should be a single object that can hold multiple parameters in its key/value pairs.
     * @returns {object} - exactly the above parameters in an object
     */
    function createEvaluationResult(success, remainingTokens, executeFunction, executeParameter) {
        return {
            success: success,
            remainingTokens: remainingTokens,
            executeFunction: executeFunction,
            executeParameter: executeParameter
        };
    }

    function evaluateStringExpression(expression, tokens) {
        if (tokens.length < 1) {
            return createEvaluationResult(false, tokens, null, null);
        }

        logger.debug("eval string: " + expression)
        logger.debug("token: " + tokens[0]);
        var hasMatch = tokens[0].match(expression) != null;
        logger.debug("hasMatch: " + hasMatch)
        return createEvaluationResult(hasMatch, tokens.slice(1), null, null);
    }

    function evaluateOptional(expression, tokens) {
        logger.debug("eval opt: " + stringify(expression))
        var result = evaluateExpression(expression.value, tokens.slice());
        if (result.success) {
            logger.debug("eval opt success")
            // only return the reduced token array and other parameters if optional expression was successful.
            return createEvaluationResult(true, result.remainingTokens, result.executeFunction, result.executeParameter);
        }
        
        logger.debug("eval opt fail")
        // otherwise still return successful, but nothing from the optional expression result
        return createEvaluationResult(true, tokens, null, null);
    }

    function evaluateCommand(expression, tokens) {
        logger.debug("eval cmd: " + stringify(expression.value));
        var result = evaluateExpression(expression.value, tokens);
        
        logger.debug("eval cmd result: " + result.success)
        if (!result.success) {
            return createEvaluationResult(false, tokens, null, null);
        }

        var executeFunction = function(parameter) {
            if (!parameter || typeof(parameter) != "object") {
                logger.debug("Trying to send a command, but no proper object parameter found")
                return;
            }
            var item = parameter.item;
            if (!item) {
                logger.debug("Trying to send a command, but no item parameter found")
                return;
            }

            events.sendCommand(item, expression.command);
        }
        return createEvaluationResult(true, result.remainingTokens, executeFunction, result.executeParameter);
    }

    function evaluateItemLabel(tokens) {
        logger.debug("eval item label with tokens: " + stringify(tokens))
        
        if (tokens.length < 1) {
            logger.debug("no tokens, eval item label fail")
            return createEvaluationResult(false, tokens, null, null);
        }

        // get whole item registry; since that's only a Java list, convert it first to a JS array
        // and by that way, normalize and tokenize the label for easier comparison
        var allItems = Java.from(itemRegistry.getItems())
            .map(function(i){
                return {
                    item: i,
                    labelTokens: tokenizeUtterance(normalizeUtterance(i.getLabel()))
                }
            });

        // we need a single exact match
        // first try the regular labels
        var checkLables = function(remainingItems) {
            var tokenIndex = 0;
            while (remainingItems.length > 1) {
                if (tokens.length < tokenIndex + 1) {
                    // no tokens left, but still multiple possible items -> abort
                    return {remainingItems: remainingItems, tokenIndex: tokenIndex};
                }
    
                remainingItems = remainingItems.filter(function(entry) {
                    return (entry.labelTokens.length >= tokenIndex + 1) && entry.labelTokens[tokenIndex] == tokens[tokenIndex];
                });
    
                tokenIndex++;
            }

            return {remainingItems: remainingItems, tokenIndex: tokenIndex};
        }

        var matchResult = checkLables(allItems.slice());

        logger.debug("eval item found matched labels: " + matchResult.remainingItems.length);

        if (matchResult.remainingItems.length == 0) {
            // either none or multiple matches found. Let's try the synonyms.
            var checkSynonyms = function(allItems) {
                var remainingItems = allItems.map(function(i){
                    return {
                        item: i.item,
                        synonymTokens: getSynonyms(i.item.getName()).map(function(s){ return tokenizeUtterance(normalizeUtterance(s));})
                    }
                });

                // remove items without synonyms
                remainingItems = remainingItems.filter(function(i) {
                    return i.synonymTokens.length > 0;
                });

                var tokenIndex = 0;
                while (remainingItems.length > 1) {
                    if (tokens.length < tokenIndex + 1) {
                        // no tokens left, but still multiple possible items -> abort
                        return {remainingItems: remainingItems, tokenIndex: tokenIndex};
                    }
        
                    // remove synonyms with fewer or non-matching tokens
                    remainingItems = remainingItems.map(function(i) {
                        i.synonymTokens = i.synonymTokens.filter(function(t) {
                            return (t.length >= tokenIndex + 1) && (t[tokenIndex] == tokens[tokenIndex]);
                        });
                        return i;
                    });
                    
                    // remove items without synonyms
                    remainingItems = remainingItems.filter(function(i) {
                        return i.synonymTokens.length > 0;
                    });
                    
                    tokenIndex++;
                }
    
                return {remainingItems: remainingItems, tokenIndex: tokenIndex};
            }

            matchResult = checkSynonyms(allItems.slice());

            logger.debug("eval item found matched synonyms: " + matchResult.remainingItems.length);
        }
        

        if (matchResult.remainingItems.length == 1) {
            logger.debug("eval item label success")
            return createEvaluationResult(true, tokens.slice(matchResult.tokenIndex), null, {item: matchResult.remainingItems[0].item});
        }

        logger.debug("eval item label fail")
        return createEvaluationResult(false, tokens, null, null);
    }

    /**
     * Returns the metadata on the passed in item name with the given namespace.
     * Credits to Rich Koshak.
     * @param {string} itemName name of the item to search the metadata on
     * @param {string} namespace namespace of the metadata to return
     * @return {Metadata} the value and configuration or null if the metadata doesn't exist
     */
    function getMetadata(itemName, namespace) {
        var FrameworkUtil = Java.type("org.osgi.framework.FrameworkUtil");
        var _bundle = FrameworkUtil.getBundle(scriptExtension.class);
        var bundle_context = _bundle.getBundleContext()
        var MetadataRegistry_Ref = bundle_context.getServiceReference("org.openhab.core.items.MetadataRegistry");
        var MetadataRegistry = bundle_context.getService(MetadataRegistry_Ref);
        var MetadataKey = Java.type("org.openhab.core.items.MetadataKey");
        return MetadataRegistry.get(new MetadataKey(namespace, itemName));
    }

    function getSynonyms(itemName) {
        var meta = getMetadata(itemName, "synonyms");
        if (stringIsNullOrEmpty(meta)) {
            return [];
        }

        return meta.value.split(",");
    }

    function evaluateSequence(expression, tokens) {
        logger.debug("eval seq: " + stringify(expression));
        var success = true;
        var executeFunction = null;
        var executeParameter = null;

        var remainingTokens = tokens.slice();

        for (var index = 0; index < expression.value.length; index++) {
            var subexp = expression.value[index];
            if (remainingTokens.length < 1) {
                // no more tokens left, but another sub expression is required
                // -> no match of full sequence possible, we can already abort at this point
                var success = false;
                break;
            }
            logger.debug("eval subexp " + index + "; subexp: " + stringify(subexp))
            var result = evaluateExpression(subexp, remainingTokens);
            if (!result.success) {
                success = false;
                break;
            }

            remainingTokens = result.remainingTokens;
            executeFunction = result.executeFunction || executeFunction;
            executeParameter = result.executeParameter || executeParameter;
        }
        
        logger.debug("eval seq: " + success)
        return createEvaluationResult(success, remainingTokens, executeFunction, executeParameter);
    }

    function evaluateAlternative(expression, tokens) {
        logger.debug("eval alt: " + stringify(expression));
        logger.debug("for tokens: " + stringify(tokens));
        if (tokens.length < 1) {
            logger.debug("eval alt fail")
            // no more tokens left, but at least one sub expression is required
            // -> no match of any alternative possible, we can already abort at this point
            return createEvaluationResult(false, tokens, null, null);
        }

        var success = false;
        var executeFunction = null;
        var remainingTokens = tokens;
        var executeParameter = null;

        for (var index = 0; index < expression.value.length; index++) {
            var subexp = expression.value[index];
            logger.debug("alt index: " + index + "; subexp: " + stringify(subexp));
            var result = evaluateExpression(subexp, tokens.slice());
            if (result.success) {
                success = true;
                remainingTokens = result.remainingTokens;
                executeFunction = result.executeFunction || executeFunction;
                executeParameter = result.executeParameter || executeParameter;
                break;
            }
        }
        
        logger.debug("eval alt: " + success)
        return createEvaluationResult(success, remainingTokens, executeFunction, executeParameter);
    }

    function normalizeUtterance(utterance) {
        return utterance.toLowerCase();
    }

    function tokenizeUtterance(utterance) {
        return utterance.split(" ").filter(Boolean);
    }

    function stringify(obj) {
        return JSON.stringify(obj, null, 2);
    }

    function stringIsNullOrEmpty(str) {
        return str === undefined || str === null || str === "";
    }

    // ***
    // EXPORTS
    // ***
    context.ruleBasedInterpreter = {
        interpretUtterance: interpretUtterance,
        alt: alt,
        seq: seq,
        opt: opt,
        cmd: cmd,
        itemLabel: itemLabel,
        addRule: addRule
    }
})(this);

Then I can load this and use the functions to define my rules in a separate file.

Content of /automation/lib/javascript/personal/voiceCommandRules.js:
(sorry, example only in German, but it is quite the same as in the Java interpreter linked above).

// HOW TO load this file from a script/rule created in Main UI:
// var OPENHAB_CONF = Java.type('java.lang.System').getenv('OPENHAB_CONF');
// load(OPENHAB_CONF + '/automation/lib/javascript/personal/voiceCommandRules.js');

var OPENHAB_CONF = Java.type('java.lang.System').getenv('OPENHAB_CONF');
load(OPENHAB_CONF + '/automation/lib/javascript/personal/ruleBasedInterpreter.js');

'use strict';

(function(context) {
    'use strict';

    var alt = ruleBasedInterpreter.alt;
    var seq = ruleBasedInterpreter.seq;
    var cmd = ruleBasedInterpreter.cmd;
    var opt = ruleBasedInterpreter.opt;
    var itemLabel = ruleBasedInterpreter.itemLabel;
    var addRule = ruleBasedInterpreter.addRule;
    
    var denDieDas = alt(["den", "die", "das"]);
    var einAnAus = alt([cmd("ein", ON), cmd("an", ON), cmd("aus", OFF)]);
    var schalte = alt(["schalte", "mache", "schalt", "mach"]);
    var fahre = alt(["fahre", "fahr", "mache", "mach"]);
    var hochRunter = alt([cmd("hoch", UP), cmd("runter", DOWN)]);

    // ON OFF type
    addRule(seq([schalte, opt(denDieDas), itemLabel(), einAnAus]));
    
    // UP DOWN type
    addRule(seq([fahre, opt(denDieDas), itemLabel(), hochRunter]));
})(this);

I then just set up a rule with “Update” trigger of the VoiceCommand variable and run the following script.
Content of a Main UI ES5 script:

var logger = Java.type('org.slf4j.LoggerFactory').getLogger('org.openhab.rule.' + ctx.ruleUID);

var OPENHAB_CONF = Java.type('java.lang.System').getenv('OPENHAB_CONF');
load(OPENHAB_CONF + '/automation/lib/javascript/personal/voiceCommandRules.js');

ruleBasedInterpreter.interpretUtterance(itemRegistry.getItem("VoiceCommand").getState().toString());

The code tries to get the item by its label (not the unique name, since spaces are not allowed there and thus tokenizing not possible) or by the user defined Synonyms (meta data). A single match is required, so the items should be labeled accordingly.

If a command and an item is found, it tries to send the defined command to the item. But user defined functions can be executed if a rule matches - and here it gets interesting. Any JavaScript function can be executed.

What do you think?
Useful or superfluous?
I’ve also seen the HABot, but also some posts referring to it as “old” - is HABot still a thing and actively used/developed and should this kind of voice control be done with HABot? (I have not set it up and no experience with it)

denominator · August 26, 2021, 8:41am

This is excellent good work.

My voice control is through google home

You can link openHAB to your google home account on your phone

I use Ifan03 runing 3rd party firmware TASMOTA for my room lights and ceiling fan.

Here are the items setup

Switch Dining_light "Dining Light" { ga="Light", channel="mqtt:topic:myMQTTBroker:fan1:Power1" }
Dimmer Dining_fan "Dining Fan" { ga="Fan" [ speeds="0=off:zero,1=slow:low:one:on,2=medium:two,3=high:three:100", lang="en", ordered=true, roomHint="Dining" ], channel="mqtt:topic:myMQTTBroker:fan1:fanspeed" }

Then you can just say to your phone “Hey Google turn the Dining Light off” “Turn the Dining Fan to slow”

JanMattner · August 27, 2021, 7:00pm

That’s also interesting and for simple use cases sufficient and it really seems pretty easy to set up.

(I leave out the privacy aspect of Google Home, Alexa & Co., that’s an individual decision)

For me, I need a more elaborate way, e.g. to control my rollershutters: on sunny days, I want some of them to be closed about half way (to block enough sunlight and prevent heating up the house, but leaving them a bit open to let enough light in). Since they have different shut times, the target value differs and I do not want to remember each value
So I just wrote a script with all values listed and I just need to tell the correct voice command to trigger each rollershutter for the corresponding target value. Furthermore, the current rollershutter value is not always hit perfectly, there’s a “buffer” of about +/-10% accuracy (e.g. after the command “50” it may stop at “42”; if I again send the command “50”, the shutter just goes down and fully closes). The rollershutter should only be triggered if the target value is outside this buffer range of the current value.

Such things do not seem to be easily realizable with those out-of-the-box solutions.

denominator · August 28, 2021, 12:38am

There are only two things your program is going to do and that is work or not work. How you come about programming your system has everything to do with you.

You could try using the rollershutter item and tagging it the like my fan you can set different values to words spoken.

{ ga="Fan" [ speeds="0=off:zero,1=slow:low:peanut:on,

“Hey google turn fan to peanut”

This is all to get voice input into openHAB. I could care less if google knows I used voice at 2am to change the temp of my aircon without opening my eyes.

This is me telling my smart kettle to behave

This is the most frustrating thing about any system. Computers are suppose to do what you tell them to do. When they don’t do what you tell them to you need to tell them again politely.

What you could try and do to overcome real world constraints is to disconnect your input from the output.
Create a dummy roller shutter item that is connected only to voice. Then in rule or script control that item depending on where it is.

When the roller shutter item is changed then decide what you want it to do. Then you can use a routine to get it to the position you want it.

As you grow your system you may even have it decide for you what position you want the shutter in. Like if its sunny outside and over set temp close to 20% however if no one is home close all the way.

JanMattner · August 28, 2021, 7:06pm

Ah that’s interesting, thanks for the hint.

As I’ve written, that’s a personal decision that everyone needs to make on their own. But privacy IS important for many people, for others not (so much) - which is OK either way. I didn’t state my opinion, because I did not want to start a rant about it, so this topic is not hijacked

Btw, the constraint about inaccurate stop value and the rollershutter just going down is because of the Eltako actuator.
[Edit: it just came to my mind that it’s rather an error in the enocean binding than in the actuator. Maybe I’ll have a look into that to really solve the issue. ]
For me, it is easier to add a workaround on the software side instead of solving the issue on hardware/firmware side.
Thus, I’m always in favor for an easily extensible solution. From my experience (private and professional SW dev), out-of-the-box solution like Google Home or Amazon Alexa integrations are great for all the “mainstream” (=mostly used/wanted) use cases, but you need to dive deep to solve edge cases. Usually almost everything is possible, but (at least from what I know) e.g. in this case (adding a buffer on software side) it would be difficult for the Google Home integration (i.e. I need to change the Java code of the add-on; I’m not experienced in Java development, so I try to avoid that).
Please correct me, if I am wrong.

Sounds interesting, what is the benefit from using a dummy/virtual rollershutter here? I mean, somehow, from the voice input, the program needs to conclude which item (group) is meant and apply the action to it - directly on the real items. Why using a dummy?
There is probably something I am missing / not understanding here. Could you please elaborate / write an example?

good idea, I start small, but I’ll come back to this with more sensors / data integration

JanMattner · September 21, 2021, 9:21pm

FYI: I’ve published the code in GitHub, see https://github.com/JanMattner/voice-control-openhab

Feel free to contribute!