Getting Rhasspy working with OpenHAB

What / Why / the Ultimate Goal

To have completely self hosted and private multi-room voice control of openHAB, using almost completely F/LOSS software and hardware[1], and without relying on any “cloud” services nor multinational companies who seem to be in the business of gathering all your data like when you are home or not, speaking privately to your wife, etc. (which is very creepy).

Note 1: the only thing I think that is not would be the RPi bootloader (if you are using RPi), but there are hardware alternatives.

Oh, that can never happen (you say)!



:wink:

Background

I recently received my ReSpeaker USB Mic Array which I had planned on using with snips.ai. Today I set out to implement this, and brushing up on my reading again, learned that Snips sold out to Sonos and will be discontinuing console access, essentially rendering it useless for our purposes.

After reading that thread (and others) and doing some further research, it seems to me that at this time Rhasspy seems to be by far the most feature complete and stable alternative out there right now, and also seems to be actually completely F/LOSS (unlike Snips, who never really fully released all their source code). In short, I feel confident enough to invest my time moving forward with this solution.

Progress Thus Far

In production I plan on having a number of satellites (one in each room) on lower powered devices streaming the audio to a central, more powerful SBC (ex. ODROID-XU4, etc.) doing the heavy lifting (STT, training, and intent recognition). But for now I am just trying to get this working on my plain amd64 Debian Buster desktop first.

I plugged the ReSpeaker USB Mic Array in to my desktop, and it immediately appeared in Pulse Audio Volume Control (yay for truly F/LOSS/H!).

I don’t use Docker, nor Home Assistant (Hass.io), so I installed Rhasspy into a Python virtual environment. This went pretty seamlessly, and after fiddling with a few settings in the web interface and PAVC I was pretty quickly training Rhasspy by writing sentences. The default recognition was actually pretty good right out of the box with default settings.

Next I started trying to get these intents into OpenHAB. It seemed to me that publishing to MQTT might be the easiest way. But I got so lost in trying to parse the JSON that my head now hurts and I come here for help. I even installed MQTT Explorer along the way, and I am pretty sure I have the correct topic (hermes/intent/DeskLights for instance, as set in Rhasspy).

I have searched and read several forum posts, the gist of it seems to be to create a text item to receive the MQTT JSON payload, and then parse that and then make a rule to update something else? That seemed overly complicated to me.

The first item I thought I would try and control was a simple light switch (via MySensors binding). But I couldn’t seem to figure out how to simply connect the two.

Taking a Strategic Step Back

I also wonder if I should maybe just have Rhasspy do the Speech To Text, and then let OpenHAB’s Rule Voice Interpreter parse the text? I think there is a way to just send the text somehow from Rhasspy.

Not sure what the best way forward is at this point? Any help would be greatly appreciated. And with Snips.ai recent sale to Sonos, I don’t think I will be the only one. In fact I was a little surprised not to find a thread about this already. So I am sure if we are able to figure something out, that others would also be helped.

4 Likes

I suggest you take a look at the Mycroft skill for OH.
Haven’t heard about Rhasspy before. Wondering why they “recommend” Mycroft when you’re unafraid to leak privacy though. AFAIK Mycroft runs without Inet, too (for speech-to-text as well as for intents).

Thanks for input.

All the way at the bottom of the Rhasspy Docs page (linked above) under the last heading Intended Audience it says:

Rhasspy is intended for advanced users that want to have a voice interface to Home Assistant, but value privacy and freedom above all else. There are many other voice assistants, but none (to my knowledge) that:

  1. Can function completely disconnected from the Internet
  2. Are entirely free/open source
  3. Work well with Home Assistant, Hass.io, and Node-RED

If you feel comfortable sending your voice commands through the Internet for someone else to process, or are not comfortable with rolling your own Home Assistant automations to handle intents, I recommend taking a look at Mycroft.


In the meantime, I made some progress. There are quite a lot of different ways to do this apparently.

I tried to link the MQTT item directly to the switch. This worked for changing the status, but did not actuate the relay (could verify in console as well as Basic UI switch).

I am currently following this thread, which, although about Snips is very similar in it’s use of JSON and MQTT. The way they suggest is to basically link the MQTT topic to a string item, and then make a rule that whenever that changes to extract your values from that and then issue commands accordingly.

I thought I read somewhere about a “things and channels” way to do it, but only so far heard that alluded to, with no direct instructions.

I feel like I can probably get the way I linked to working, but it just doesn’t feel right or ideal?

1 Like

Mycroft can run without internet access but it’s really not great that way. They actually recommend using google stt. This might change as they are cooperating with Mozilla deep speech project but afaik that is not ready for prime time yet. I also heard a lot of people having trouble with things like picroft there raspberry pi implementation and working with Respeaker mics. They also don’t have the concept of satellites like snips did.
Rhasspy is actually well worth looking at. It ties many different opensource projects that deal with all parts of the workflow together in a nice way and gives you a choice of what to use for every step.
It just gives you deeper access than mycroft would and it’s easier to implement your own cudstom intents and all that 100% on device including training of things like statistical language models.
Johannes

2 Likes

The one where you link the Mqtt channel to a text item and than parse the json in a rule triggered by the change of it is the way to go.
As you build your assistant further you will have more complex intents where you might want to extract several values to act on in openHAB which will work best with a rule.
I wrote a tutorial for snips and jython on this forum once were many things would probably apply to connecting Rhasspy.
You can find it if you search for snips and jython.
That might give you some inspiration.
Johannes

2 Likes

Yes I came across your tutorial (I think this one?) which helped me in figuring out the linking, but once you started with your (seemingly elegant?) code, it went right over my head. :slight_smile: Maybe I will get there one day, but in the meantime hopefully my post will help my fellow mere mortals and other lower level wizards like myself to get started at least. :smiley:

And yes, I found the Rhasspy interface quite nice and full of choices I have only begun to explore… Here is a video I came across (for others, I know you are already familiar):


In particular, there are lots of nice substitution and other options when defining your outgoing JSON structure from Rhasspy. So, now by thinking carefully about this, I will probably end up with only one Rhasspy MQTT Thing, with a number of defined sub keys in JSON to actuate different lights, etc…

I feel I am well on my way to getting the “MQTT to string to rule to command” way working, and will post my working config for posterity when I do.

Thank you for confirming I am moving in the right direction though!

1 Like

After reading that thread (and others) and doing some further research, it seems to me that at this time Rhasspy seems to be by far the most feature complete and stable alternative out there right now, and also seems to be actually completely F/LOSS

I guess that depends a bit on your preference … if you are looking for a voice assistant that works with openHAB out-of-the box and is already set-up to understand a lot of smart home related commands you may also want to check out SEPIA. The current release version is still a bit limited, but further down in the thread you will find a test build of the (soon to be released) next version that understands a lot more smart home devices, rooms and actions.

If you like to train your own speech recognition though that is possible with SEPIA but more comfortable to do with Rhasspy at the moment thats true :wink:

I have been very happy with Rhasspy so far, but good luck to you with your project! Multiple competing F/LOSS options only makes things better for all of us little people!

I have been thinking more about the “ontology” or how I might want to structure this. Perhaps a few different intents like:

  • onoff - Could be used for any kind of on off item (switches, etc.).
  • dimmer/slider
  • other - Weather or other queries? Play music, etc… (perhaps each custom?)

Because it is easy to pass node and status information in JSON. You can even convert from normal speech (eg. “living room light” to node_2g43q_status) right inside Rhasspy! I suppose that will become evident when I post some examples…

I need to think on this and keep playing, I suppose the structure will reveal itself as I go along. But already I see why some of the more complex types of parsing code have been created.

I did get my 2 lamps working by voice. It was a big moment for me, something I wanted to do for a long time with a fully open hardware and software stack, and completely in house (no cloud!). So that was pretty exciting! I will post up my working config as there were a couple small syntax changes from other examples I read.

Anyway, yeah no sooner did I get the lights working than my mind began racing at all the possibilities and that’s when I really started thinking about the structure and how I want to build this out going forward…

EDIT: Oh, forgot to add, the main problem actually turned out not to be with parsing the JSON (which is pretty easy, once you get your head around it) but rather that I had somehow the wrong item linked to my MQTT thing channel! :confounded: Once I found that by carefully reviewing everything a second (or third?) time, I was off to the races… :smiley:

I know the temptation of building from scratch :grin: Good luck to you too :slightly_smiling_face:

I am not sure that is a good characterization of Rhasspy. I think any bespoke implementation of openHAB in general is going to require a lot of custom “implementation” which some would even call “programming” depending on your viewpoint.

Having said that, I found Rhasspy to be quite polished and useable right out of the box with only very mimimal configs (choosing microphone. etc). In fact I was amazed at all the options available, and polish level, and documentation, especially considering that it appears to be only one guy working on it(!)

when I say “from scratch” I mean you are thinking about how to build your intents, your language model, train your data, invent ontologies, write the service that puts everything into action etc. etc… If you want to do this I guess Rhasspy is a good tool.
Actually in SEPIA there is an interface for tools like Rhasspy where you can use your intents as one step in the NLU chain and precursor to the dialog management.
From what I’ve heard from Michael (he’s created Rhasspy) the resulting STT model might even be compatible with SEPIA since he’s using Kaldi as well (besides Pocketsphinx). I haven’t had time to try yet but I will at some point :slight_smile:
What I mean is comparing Rhasspy to SEPIA is like comparing tomatoes and cheese with a Pizza :sweat_smile: (very good tomatoes and cheese I admit).

I figured there would be a lot of overlap, as you seem to be using many of the same underlying (F/LOSS) tools.

I was wondering if it might be a duplication of effort, until I realized that your tools seem to be mostly written in Java, where Rhasspy is mostly written in Python. The fact that almost all of these language and AI tools also seem to be written in Python also pushed me in that direction (although pros and cons could be made in either direction, especially when integrating with a Java based tool like openHAB).

For that matter, one could use Rhasspy for the STT and then do the intent recognition right in openHAB. I may look into that at some point, because then you could also do things like send XMPP messages to openHAB as another method of control in addition to voice.

On the other hand, there are so many different choices of intent recognition already built in to Rhasspy, and especially the default one (fsticuffs) works great right out of the box and seems to be well suited to the domain of home automation.

There really are so many different potential ways to set this up… It’s wonderful! I really don’t see myself being limited in any way when using Rhasspy, which is one of the main reasons I decided to use it (and the same reason I chose openHAB).

1 Like

Yes that is true. Tools like spaCy and Rasa for example are wonderful for building natural language understanding code. I like them so much that I built a super lightweight Python server that is deeply integrated into SEPIA’s chain of NLU modules :face_with_hand_over_mouth: … but in the end it doesn’t really matter that much since nowadays almost everything has a REST interface to use :smiley:
Currently SEPIA services have to be written in Java, but I hope to simplify the process so much that I can put a tool like Blocky (the graphical programming thingy) on top… well lets see ^^.

There really are so many different potential ways to set this up… It’s wonderful! I really don’t see myself being limited in any way when using Rhasspy, which is one of the main reasons I decided to use it (and the same reason I chose openHAB)

Hehe yes its true :grin: . Currently my goal for SEPIA + openHAB is to give users a tool that just works out-of-the-box with minimal configuration. When you have a server running openHAB you already have Java installed so you can simply place the SEPIA server next to it, do a 5min setup, open the client and tell the system stuff like “switch on the lights in the livingroom” :innocent: … and if you reach the limits of SEPIA you can start playing around with all the developer stuff … if you like :smiley:

I guess its starting to get off topic now since this thread is about getting Rhasspy working :sweat_smile: but my hope for the future is that all these great open-source tools work well together and profit from each other instead of running in parallel and everything gets fragmented :crossed_fingers::expressionless: . I’ll definitely keep an eye on Rhasspy and add some interfaces where it makes sense :slight_smile:

1 Like

Rhasspy vs SEPIA Comparison

I think SEPIA might be a better fit for that use case. At any rate, an interesting comparison in this thread. I added some keywords under a heading to help the search robots. :wink:

Now back to OT…

Getting Rhasspy Working with openHAB

I suppose now is as good of a time as any to go ahead and post up my working config (otherwise I may never get around to it :smiley: ).

Rhasspy bills itself as being made for Home Assistant, but there is no reason we cannot also use it for openHAB. The way that Rhasspi integrates with HA is through MQTT using JSON, which makes it equally easy to use with openHAB!

0. MQTT Broker

Getting the MQTT broker working is outside the scope of this post. I will just state however that I am using mosquitto broker on Debian, installed from repositories, and not the built-in openHAB one (although it shouldn’t matter). I already had the MQTT broker working from something else before starting this.

1. Rhasspy

I think next I actually started working from the Rhasspy side. Get that all working:

1.1. Install

Install by your preffered method. I did Python VE, but there are Docker images, etc.

1.2. Get your mic working.

You might have to “Tap to Record” in Rhasspy (Speech tab) and then check PAVC and then select your mic for input into Rhasspy.

1.3. Training

Come up with some Sentences, Train Rhasspy, and then test recognition (see video above and documentation). Look at output below. It’s really easy, actually.

1.3.1. Important Note - Conversions

I almost forgot, I do some conversions from common spoken language (eg. “right [or] left desk light [or] lamp”) directly to openHAB Item names in this step, like so (read Training - Rhasspy docs to understand this syntax):

sentences.ini:

[Lights]
turn (right desk (light|lamp)){item:mysensors_light_bcc8b6a2_light_1_1_status} (on:ON | off:OFF){status}
turn (left desk (light|lamp)){item:mysensors_light_bcc8b6a2_light_1_2_status} (on:ON | off:OFF){status}
turn (both desk (lights|lamps)){item:gDeskLights} (on:ON | off:OFF){status}

You don’t have to do it this way, but I do think this way is easier (but still thinking about it :wink: ). You will see why below when we write our openHAB rule.

1.4.1 Set up MQTT settings from Rhasspy side

Should be straightforward.

2. Check MQTT Messages

Using https://mqtt-explorer.com/ (or similar) I made sure I could see the MQTT messages being sent.

3. openHAB Side

Finally, time to get things working on the openHAB side:

3.1. MQTT Broker Thing

First I made a Thing for the mosquitto broker, with a Rhasspy intent channel:

mqtt.things:

Thing mqtt:topic:mosquitto:rhasspy "Rhasspy voice control" (mqtt:broker:mosquitto) {
        Channels:
                Type string : intent_lights "Rhasspy Intent Lights" [ stateTopic="rhasspy/intent/Lights" ]
        }

I called the intent lights but you could call it onoff (perhaps more appropriate, see my thgoughts in post(s) above) or whatever you like. It just has to match what you set up in Rhasspy (above). The intent name becomes the last part of the topic.

3.2. Item

Next, an Item to hold the incoming text string:

mqtt.items:

String rhasspy_intent_lights "Rhasspy Intent Lights" { channel="mqtt:topic:mosquitto:rhasspy:intent:Lights" }

Just as I am typing this, I realize this is where I may have messed up. I ended up with two different items and wrong one linked somehow, which caused me frustration later. I am still not too far from being a newbie in openHAB myself. :smiley:

3.2. Rule

Finally, a Rule to act when the text string is updated, and then do things. As I recall, some of the logging syntax changed here from other examples I found. Also I think this way is simpler than other examples I found. It just gets the Item name and Status from the JSON and directly uses them. But there are lots of different (and likely more elegant) ways you can structure this. I just want to post a working example as a starting point.

rhasspy.rules:

rule "Rhaspy"
when
    Item rhasspy_intent_lights received update
then
    var String intentItem = transform("JSONPATH", "$.item", rhasspy_intent_lights.state.toString)
    var String intentStatus = transform("JSONPATH", "$.status", rhasspy_intent_lights.state.toString)

    logInfo ("rhasspy_intent", "Rhasspy raw JSON: " + rhasspy_intent_lights.toString)
    logInfo ("rhasspy_intent", "Rhasspy intentItem: ", intentItem)
    logInfo ("rhasspy_intent", "Rhasspy intentStatus: ", intentStatus)

    intentItem.sendCommand(intentStatus)
end

Conclusion

Whew! :smiley:

OK, I think that covers most if not all of it. It’s still pretty fresh in my mind, luckily. Feel free to post questions if you get stuck though. But plan on doing some reading. These are very powerful and flexible tools.

Next Steps

  • Get a wake word working
  • Integrate more voice commands and services (weather, perhaps news, etc…)
7 Likes

Thanks for documenting your steps! I have used some parts of it. I’m also experimenting with the same setup and made a small adjustment to the rules (filter on slotname):

var String intentItem = transform("JSONPATH", "$.slots[?(@.slotName=='item')].value.value", Rhasspy_Intent.state.toString)
var String intentStatus = transform("JSONPATH", "$.slots[?(@.slotName=='status')].value.value", Rhasspy_Intent.state.toString)

For wakeword I used porcupine on rhasspy and now I am doing some tests with snips-wakeword. Just like you in the exploration phase :slight_smile:

Greate work,

I get the MQTT Messages and can handle these. But how can I get my Rhassy talking to me?

I had to shelve this project due to various reasons (life/time,etc.). Therefore I never got that far, but from the research I did at the time (if I recall correctly) I was going to try to pipe the sound output from Rhasspy into something like snapserver, so that the Home Automation system could talk to the whole house (or at least, whatever speakers/devices are tied into the system). I was also going to try and figure out some logic to know where the query came from (which room / device) and then perhaps reply only to that room. But like I said, I never got that far.

I was planning a distributed system, with microphones in most rooms on less powerful, inexpensive SBCs (RPi Zero equivalents maybe, or similar) and then one more powerful central unit (likely ODROID-XU4) to do the actual voice processing. Rhasspy does allow for such an architecture, and it shouldn’t be too hard to set up, but I never got around to implementing it (yet).

I hope to get back to this soon(-ish), but if you figure out something in the meantime, please do post back for the benefit of others…

Hi there,

first of all thanks to EVERYONE making openhab this good and useful. After more than one year of passively taking benefit of this forum, I think its time actively participate and contribute.

I am using openhab 2.5.5.1 on a RPI 3b+ based on the openhabian image + deconz 4 the zigbee part. About a week ago I uninstalled snips after month of use because of SONOS :frowning:
and installed rhasspy 2.5 via docker (tried VENV first because of performance concerns, but that didn’t worked for me - I think there are some invalid depencies at least at the moment).

After some time frickling around I got MQTT working (needed to add the network host parameter to the run command for the container)

ACTUAL REPLY to your question :wink:

  • First of all you need MQTT ACTION Binding
  • As I unterstood, it is not recommended to use “hermes/tts/say” instead you should use “hermes/dialogueManager/startSession”
  • then you you have to send a JSON-String, where “text:” is followed by your payload (in my example the variable time) - for further information to the formatation of the string look at the online documentation of rhasspy and search for “Dialogue Manager”
  • last important parameter is “site_id”, that tells which of probably multiple sites has to react to that MQTT-message (in my case I named it “rhasspy_central”)

Simple Example - working for me:

if (snipsIntent == "Zeit")
{
	var hour = now.getHourOfDay
		var minute = now.getMinuteOfHour
	var String time = ("Es ist "+hour.toString+" Uhr "+minute.toString)
	
	logInfo("SnipsPOWER ", "Es ist: "+time)

	//json-String für hermes/.../startSession
	var String jsonaction = '{
			"init": {
			"type": "action",
			"canBeEnqueued": true,
			"text": "'+time+'",
			"intentFilter": null,
		"sendIntentNotRecognized": false
			},
	"site_id": "rhasspy_central",
	"customData": null
	}'
			
	publish ("snipsmqtt", "hermes/dialogueManager/startSession", jsonaction)
	
 }

PS: please excuse my bad english, I am from Germany

2 Likes

Hey nk_one,

thank you very much. I use “hermes/dialogueManager/endSession” with the SessionID from my question.

I wonder how I should use Rhasspy with openHab. Should I Do everything with rules inside openHab, use Red Node to handle my actions or use Python Scripts (with is maybe a good way for fute skills?). But i have no experiences with the last two options.

What qould you recommend? Which way do you use to handle the communication between OH and Rhasspy?

Hi,

sorry for the late reply. At the end it is up to you, how to use it.
Personnaly I love the possibilities openhab gaves me, so I acutally use rhasspy only for the voice-part with the intent-recognition.
Rhasspy publishes the intents via mqtt which I then extract from the json-string and let openhab do the magic.

My snips.items
Snips_Intent "Snips Intent" { mqtt="<[snipsmqtt:hermes/intent/#:state:default]" }

An extract of my snipsPOWER.rules as example

    rule "SnipsPOWER"
when
  Item Snips_Intent changed 
then

logInfo("SnipsPOWER: ", "Snips_Intent changed...")
var String json = Snips_Intent.state.toString
var String snipsIntent = transform("JSONPATH", "$..intentName", json)
		
var String snipsSlot1Value = transform("JSONPATH", "$..slots[0].value.value", json)
	var String snipsSlot2Value = transform("JSONPATH", "$..slots[1].value.value", json)
	var String snipsSlot3Value = transform("JSONPATH", "$..slots[2].value.value", json)	

var String snipsValue	
	
logInfo("SnipsPOWER: ", "Intent: "+snipsIntent)
logInfo("SnipsPOWER: ", "Slot1Value: "+snipsSlot1Value)
logInfo("SnipsPOWER: ", "Slot2Value: "+snipsSlot2Value)
logInfo("SnipsPOWER: ", "Slot3Value: "+snipsSlot3Value)

if (snipsSlot1Value == "ON" || snipsSlot2Value == "ON" || snipsSlot3Value == "ON") {snipsValue = "ON"}
if (snipsSlot1Value == "OFF" || snipsSlot2Value == "OFF" || snipsSlot3Value == "OFF") {snipsValue = "OFF"}

**...**

if (snipsIntent == "TV")
{
	if (snipsSlot1Value == "TVan")
	{
	if (poweroutletHSwitch.getState.toString == "ON") sonyPower.sendCommand("ON")
	else 
		{
		poweroutletHSwitch.sendCommand("ON")
		PowerTimer4 = createTimer(now.plusSeconds(180), [ |
		sonyPower.sendCommand("ON")
		])
		}
	}
	if (snipsSlot1Value == "TVaus")
	{
	sonyPower.sendCommand("OFF")
	}
	if (snipsSlot1Value == "picMuteOn")
	{
	TVIP_PictureMute.sendCommand("ON")
	}
	if (snipsSlot1Value == "picMuteOff")
	{
	TVIP_PictureMute.sendCommand("OFF")
	}
	if (snipsSlot1Value == "play")
	{
	sonyCmd.sendCommand("Play")
	}

**...**

The other two options yout mentioned I don’t know either and not being willingly to get used to at the moment.

Question to the community - at the moment I am using a weak speaker connected to the 2-mic-pihat for Audio-output but I would like to use a bluetooth speaker instead. Unfortunately I can’t get the bluetooth-speaker show up as an audio-out-option in the rhasspy-docker-container. Already installed blue-alsa on the host-system where openhab is running too and got the onboard bluetooth of the rpi 3b+ connected to the speaker. Any ideas?