Mimic Text-To-Speech

logo

IMPORTANT INFORMATION : Mimic addon is now included in the official openHAB 3.4 release. If you use this 3.4 version or above, please get the addon from the official repository instead of the marketplace. It will soon be removed from the marketplace.
EXCEPTION : if you use a sink that make use of the openHAB audioservlet, such as Webaudio or Sonos, you have to use the marketplace version, which fix a critical bug by letting you activate a workaround.

Mimic (version 3 and above) is an offline open source Text-To-speech engine designed by Mycroft A.I. for the eponym Vocal Assistant.

Description

It provides multiple voices, available in different languages and variants.
Its neural network is built upon some very good and some not-so-good models. Try some to be sure you get the best one for your need.
Mimic3 doesn’t need Mycroft, it can be run standalone as a service or command line utility.
When launched as a web server, it exposes its capability through a web API. This TTS bundle make use of this feature, so please take note : this openHAB TTS bundle is NOT a standalone ! it requires Mimic web server to run somewhere (on your openHAB computer, or your network)

Changelog

Version 3.4.1 Fix 2

  • Fix an issue when playing with audiosink using the audioservlet

Version 3.4.1 Fix

  • Fix SSML
  • Fix an issue with some voices
  • Add a workaround to play with sink using the openHAB audio servlet (like chromecast audio, etc.). Use the new parameter workaroundServletSink

Version 3.4.0_POST_method

  • Using a POST method allows for longer request

Version 3.4.0_PR0

  • initial release

Resources

4 Likes

Excellent, thanks.

I’ll give this a try.

Do you know if I can run it on a RPI3?

I don’t know, I didn’t try either. But the result interests me.
The mycroft website advices for a PI4, but the 3 is also mentionned. It is worth a try but I can’t as my PI3s are in armv7 os (the RTF seems very bad on it).

CPU specs of RPi 4 vs 3 don’t make for a large difference so I’d guess yes it should do. The 3 is very memory constrained though so it depends on much else you run there.

Is there a reason to maintain the binding as published in the marketplace, with risk of confusion for the users, while the binding is now part of the official distribution?

If I’m not mistaken, Mimic TTS is to be included in the 3.4 version, which is not yet released.
Do you think that I should remove it now ? I thought I should wait for the official release ?

I think you can wait for OH 3.4 release.
I thought it was added earlier in the past.

Hi,

I have being testing this binding and having problems playing the audio on my configured sinks. I don’t know if it is something related to my setup, but every time I execute a say command I get back a 500 error “Operation not supported”. I have tried web audio, chromecast and Yamaha Musiccast speaker with the same result. PicoTTS is working without problems. Sending the url to my mimic3 server with the chromecast playURL action is also working.

The only way I could get the binding to work was by modifying the code to cache the wav audio to a file. It looks like the 500 error “Operation not supported” is from getClonedStream().

After getting the binding to work, using ssml was not working. At least on my server the ssml parameter (either ssml=1 or ssml=true) is not working when using POST (it works with GET), I had to set the Content-Type to application/ssml+xml header for it to work.

I also notice that I was not getting the selected voice on the audio, for it to work I had to urlencode the voice name included in the url.

Thanks for the work done on this binding. Please let me know if there is something I can test to know if these problems are related to y environment or not.

My guess is that the webaudio, yamaha musiccast and chromecast sinks all use the openHAB servlet to get the sound to play ?
This servlet use the getClonedStream(), which require the TTS service to have the capability to retain all the audio data and not only a “stream”.
And as you already found out, MimicTTS has not this capability, I designed it to be as “stream” and simple as possible.

Coincidently, I have a pull request for openHAB core that, as a side effect, handle this issue by creating/keeping a file as a cache. A wrapper implements this getClonedStream method.
But it is not on the way to be included in the 3.4 version (next week)

I see that you are a coder, great !
Could you share your code to spare me some time ?
We can include this as a workaround for sinks using the servlet (with a boolean parameter or something like that to activate it). And maybe remove it later if my pull request for caching TTS is accepted.

I think I forgot to test SSML when I changed the call method from GET to POST in one of my last commit… And I don’t use it. So… Oops, sorry ! I will try to fix it.

Does this voice include a special character ? Indeed, I should have urlEncode it. I will also fix it.

Thank you very much for your feedback and corrections !

1 Like

This seems to be the case, thanks for the details.

Yes, I saw the pull request while searching for a solution. Looks like a great contribution.

I don’t know if I qualify as a coder, but I can throw some code together to have fun and make my life easier. I just borrow some code from the picotts and MarTTS bundles. Here is the diff of the changes, it may be at least a starting point for the temporary workaround, not making any promises about the quality of the code, but seems to be working.
mimictts-changes.patch.txt (7.7 KB)

Here is the voice was using es_ES/m-ailabs_low#victor_villarraza

Thanks !

I made a pull request with all these elements.
I will try to update the JAR in the marketplace, but it requires me to switch back to the 3.4 version (dev is now on 4.0.0), so it will wait until I have other things to do on 3.4 :sweat_smile:.

New release available, with the fixes from rotec52 (ssml, some voice with bad encoding, cannot play with some sinks)
The fix for playing with some sinks using the openHAB servlet requires the activation of a new parameter : workaroundServletSink.

Open post updated. As usual, remove and reinstall from the marketplace.

Hi,
I tried to install mimic server on my openhab 3.4.1 pi with openhabian.
Got this:

type or paste code here
```     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 137.4/137.4 kB 471.0 kB/s eta 0:00:00
ERROR: Cannot install mycroft-mimic3-tts[all]==0.2.2, mycroft-mimic3-tts[all]==0.2.3 and mycroft-mimic3-tts[all]==0.2.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    mycroft-mimic3-tts[all] 0.2.4 depends on onnxruntime<2.0 and >=1.6
    mycroft-mimic3-tts[all] 0.2.3 depends on onnxruntime<2.0 and >=1.6
    mycroft-mimic3-tts[all] 0.2.2 depends on onnxruntime<2.0 and >=1.6

Can anyone help me? Am I now in the hell of dependancies?
Best regards
Knut
type or paste code here

As I use docker on a x86-64 PC, unfortunately I cannot help you here, sorry.
(But if you manage to run it on your PI, I’m interested in your performance result)

It looks like you’re trying to install three versions of the same thing.

Indeed it does. But I just gave one command to install. I read
when the installation process has dependence problems it
tries to look for earlier versions automatically.
However, is there anyone here who runs that server
on his openhab pi 4 machine?

1 Like

Trying to give this a try out now as use mimic elsewhere. I’m seeing it call my mimic3 server no problem, and the OH audio logs show getting some audio from it. I’m trying to play via websink, but browser just getting a 500 error when trying to play eg http://localhost:8080/audio/478bea35-a162-405b-9fab-d53bfebc3c1b - 500

I’ve tried 3.4 and 4 snapshot, and have tried market place and bundled plugin. Also tried with and without the Workaround For Servlet-Based Audiosink option. Have also tried to sonos audio sink.

Any pointers / list of supported sinks & what option should be needed?

debug logs for org.openhab.core.audio:

Mimic (not playing):

19:32:19.942 [DEBUG] [o.internal.webaudio.WebAudioAudioSink] - Received audio stream of format AudioFormat [codec=PCM_SIGNED, container=WAVE, bigEndian=false, bitDepth=16, bitRate=52000, frequency=22050channels=1]

MaryTTS (playing ok):

19:33:47.519 [DEBUG] [o.internal.webaudio.WebAudioAudioSink] - Received audio stream of format AudioFormat [codec=PCM_SIGNED, container=WAVE, bigEndian=false, bitDepth=16, bitRate=768000, frequency=48000channels=1]

Cheers, Ross

Hello,

To investigate, I think I should try to reproduce your environment as best as possible.
Can you tell me the mimic version you use (with distribution method/arch)
And also the voice ?

I’m using the docker image - below is what im currently testing with. Using the UK voice.

Just updated the message above with the audio logs for mimic vs a marytts (which does work) - difference in sampling rates?

docker run \
       --name=mimic3 --net=ross -it \
       -p 59125:59125 \
       -v "${HOME}/.local/share/mycroft/mimic3:/home/mimic3/.local/share/mycroft/mimic3" \
       'mycroftai/mimic3'

and then testing with openhab docker

docker run \
        --name openhab \
        --net=ross \
        -v /etc/localtime:/etc/localtime:ro \
        -v /etc/timezone:/etc/timezone:ro \
        -v /opt/openhab2/conf:/openhab/conf \
        -v /opt/openhab2/userdata:/openhab/userdata \
        -v /opt/openhab2/addons:/openhab/addons \
        -d \
        -e USER_ID=113 \
        -e GROUP_ID=121 \
        -e CRYPTO_POLICY=unlimited \
        -e OPENHAB_HTTP_PORT=8080 \
        -p 8080:8080 \
        -p 8101:8101 \
        openhab/openhab:snapshot-debian