A few months ago I experimented with CMU Sphinx as purely local speech-to-text add-on for openHAB (https://github.com/openhab/openhab2-addons/pull/2220), that is, without any cloud service dependency (and the privacy concerns which go with them). CMU Sphinx is also able to recognize languages not supported by e.g. Alexa. Development has stalled for the last 3 months, and there is much work to do to simplify the configuration, but I thought it would be awesome to have other users’ feedback first in order to assess whether it’s worth pursuing further.
Therefore, I’m creating this thread to raise awareness and try to rally up people interested in seeing it in future openHAB distributions!
So, anyone interested?
You will need:
- some knowledge of openHAB;
- a working, decent microphone - preferably a far-field microphone to pick up your voice from across the room - you can also buy a PlayStation Eye USB microphone for less than $10/10€ on Amazon; try recording your voice with a recorder program before attempting to have it work with openHAB to make sure the source, volume, etc. are set properly;
IMPORTANT NOTE: the microphone should be able to record at 16kHz/8-bit/mono, otherwise it won’t work - on Linux (RPi etc.) you might have to tweak
/etc/asound.conffor ALSA or whatever if you’re using PulseAudio; this helped me with the PlayStation Eye. YMMV.
- Most of all, a hacking spirit and some free time/patience/perseverance
Here’s how to get started:
First, download org.openhab.voice.cmusphinx-2.2.0-SNAPSHOT.jar
Drop it into your openHAB distribution’s
addonsfolder to load it
You need to download several files from CMU Sphinx on Sourceforge according to your language - an acoustic model and a dictionary:
a. for US English:
- the acoustic model: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-ptm-5.2.tar.gz/download
- the dictionary: https://github.com/cmusphinx/sphinx4/raw/master/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
b. for German:
- the acoustic model: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/cmusphinx-de-ptm-voxforge-5.2.tar.gz/download
- the dictionary: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/German/cmusphinx-voxforge-de.dic/download
c. for French:
- the acoustic model: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/cmusphinx-fr-ptm-5.2.tar.gz/download
- the dictionary: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/fr.dict/download
d. for other languages:
Download resources as available from https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
NOTE: you won’t be able to use Eclipse SmartHome’s standard human language interpreter (the ‘Built-in Interpreter’) since it only supports English, German and French at the moment. Feel free to submit a PR to ESH to add support for your language. You will however be able to use the ‘Rule-based interpreter’ which sends recognized speech to an item.
Create a directory, for instance
/opt/openhab/conf/stt, extract the acoustic model in a sub-directory and also place the dictionary file there.
You will need to create a grammar file by hand (for now), in JSGF format (https://www.w3.org/TR/jsgf/) which will describe the sentences CMU Sphinx will be able to recognize. If you’re using the built-in interpreter, those sentences will have to be those supported by the interpreter. Your grammar should also include a keyword (or hotword, or “magic word”) of your choosing, which you will configure later. You will need to speak this word before giving an actual command. Note: it doesn’t need to be one word, expressions like “hey openhab” can work too.
Create a directory in your filesystem (for instance
/opt/openhab/conf/stt/grammar) and write a grammar file named
commands.gram(the extension is important) in it.
Here are some examples:
#JSGF V1.0; grammar commands; <location> = living room | kitchen | bedroom | corridor | bathroom | garage; <thing> = lights | heating | fan | blinds; <item> = <location> <thing>; <onoff> = on | off; <turn> = turn | switch; <put> = put | bring; <increase> = increase | brighten | harden | enhance; <decrease> = decrease | dim | lower | soften; <color> = white | pink | yellow | orange | purple | red | green | blue; <switchcmd> = <turn> [the] <item> <onoff>; <increasecmd> = <increase> the <item>; <decreasecmd> = <decrease> the <item>; <upcmd> = <put> the <item> up; <downcmd> = <put> the <item> down; <colorcmd> = [set] [the] color [of] the <item> [to] <color>; <keyword> = openhab; public <command> = <keyword> | <switchcmd> | <increasecmd> | <decreasecmd> | <upcmd> | <downcmd> | <colorcmd>;
- German (untested, please correct if necessary):
#JSGF V1.0; grammar commands; <location> = küche | büro | schlafzimmer | badezimmer | garage; <thing> = beleuchtung | heizung | ventilator | rollläden; <item> = <location> <thing>; <einaus> = ein | aus; <schalte> = schalt | schalte; <mache> = mach | mache; <mehr> = heller | mehr; <weniger> = dunkler | weniger; <farbe> = weiss | pink | gelb | orange | lila | rot | grün | blau; <dendiedas> = den | die | das; <switchcmd> = <schalte> [<dendiedas>] <item> <einaus>; <increasecmd> = [<schalte> | <mache>] <dendiedas> <item> <mehr>; <decreasecmd> = [<schalte> | <mache>] <dendiedas> <item> <weniger>; <upcmd> = <mache> <dendiedas> <item> hoch; <downcmd> = <mache> <dendiedas> <item> runter; <colorcmd> = [<schalte>] [<dendiedas>] <item> [auf] <farbe>; <keyword> = openhab; public <command> = <keyword> | <switchcmd> | <increasecmd> | <decreasecmd> | <upcmd> | <downcmd> | <colorcmd>;
#JSGF V1.0; grammar commands; <command> = allumer | éteindre | activer | éteindre | stopper | désactiver | couper | augmenter | diminuer | monter | descendre; <lela> = le | la | les | l; <poursurde> = pour | sur | du | de; <color> = blanc | rose | jaune | orange | violet | rouge | vert | bleu; <item> = bureau | salon | table | chambre | cuisine | volet; <keyword> = maison; public <order> = <keyword> | <command> [<lela>] <item> | couleur <color> [<poursurde>] [<lela>] <item>;
- You can use a word not in the dictionary (for example “openhab”) but then you have to add it: open your dictionary file with a text editor and add a line for it. Look for words with similar pronunciation and try to derive one for your word.
For example in German “opensuse” is
Q OOH P AX N Z UU Z AXand “haben” is
HH AAH B AX Nso for “openhab” you would add (please confirm this):
openhab Q OOH P AX N HH AAH B
in French it would be, similarly:
openhab oo pp ee nn aa bb
Go to Paper UI to configure several things:
a. in Add-ons > Voice, install a text-to-speech engine compatible with your system and language (and configure it) - MANDATORY;
b. In Configuration > System:
under Audio, set Default Source to System Microphone and Default Sink to System Speaker
Don’t forget to save!
under Regional Settings, set Language to e.g. en/de/fr and Country / Region to e.g. US/DE/FR
Don’t forget to save!
- Default Text-to-Speech: set to the TTS engine you installed above;
- Default Speech-to-Text: set to CMU Sphinx;
- Default Voice: configure according to your TTS engine - the voice MUST match the language/region you chose;
- Default Human Language Interpreter: you can use either one for English/German/French and are limited to the the Rule-based interpreter in other languages (see above) - for the latter, the item receiving the commands has to be conigured in Configuration > Services > Voice > Rule Voice Interpreter ;
- Default Keyword Spotter: set to CMU Sphinx;
- Magic Word: set to the keyword you defined in your grammar;
- Listening Switch: you can specify here a Switch item which will be switched on and off when the system is listening for a command after spotting the magic word. For example, you could choose to map it directly to a lightbulb or have some rules to play a sound.
Don’t forget to save!
c. in Configuration > Services > Voice > CMU Sphinx Speech-to-Text:
Locale: set to e.g. en-US or de-DE or fr-FR
Acoustic model path: set to the path to the directory containing the acoustic model files, e.g. /opt/openhab/conf/stt/cmusphinx-fr-ptm-5.2
Dictionary file path: set to the path to the file containing your dictionary, e.g. /opt/openhab/conf/stt/fr.dict
Language model path: leave blank (important!)
Grammar path: set to the path to the directory containing the grammar files, e.g. /opt/openhab/conf/stt/grammar
Grammar name: set to the name of the file (without the extension) containing the grammar, e.g. commands
Leave Start listening off for now and Save.
Check your openHAB logs and look for a line like:
[INFO ] [usphinx.internal.CMUSphinxSTTService] - CMU Sphinx speech recognizer initialized
…Try and fix encountered errors otherwise.
- Now you can go back to Configuration > Services > Voice > CMU Sphinx Speech-to-Text in Paper UI and turn on Start listening:
You will hopefully see this log line appearing:
[INFO ] [cmusphinx.internal.CMUSphinxRunnable] - CMU Sphinx: StreamSpeechRecognizer recognition started
- You may now start speaking your keyword, and if it’s recognized, you will see:
[INFO ] [cmusphinx.internal.CMUSphinxRunnable] - Keyword recognized: hey openhab, speak command now
(the “listening switch” you configured will also turn on)
Then speak a command from your grammar, if recognized you’ll see it in the logs as well:
[INFO ] [cmusphinx.internal.CMUSphinxRunnable] - Command recognized: couleur bleu pour le bureau
If the text-to-speech engine is properly configured, the voice will tell either ‘Ok.’ or the error encountered. You will also see it in the logs.
If you run into trouble, you can lower the log threshold: in the openHAB Console, type:
log:set DEBUG org.openhab.voice.cmusphinx
Every recognized sentence, valid or not, will appear in the log at the DEBUG level along with other messages.