Creation an voice audio satellite with the help of an Esp32

milo · December 1, 2024, 8:38am

Splitted from here…

https://community.openhab.org/t/ideas-and-discussion-what-features-do-you-want-in-openhab-5-0/160573/25

Ideas and Discussion: What Features Do You Want in openHAB 5.0?

Hi everyone,

I wanted to share an idea for openHAB 5.0: integrating the ESP32-S3 as a lightweight voice satellite for audio streaming to enable decentralized voice recognition. This could provide a cost-effective solution for smart homes that rely on voice control, leveraging the openHAB ecosystem to process commands centrally.

Overview

The concept involves using the ESP32-S3 to capture audio via I2S and stream it over Wi-Fi to openHAB using a WebSocket or similar protocol. The audio can then be processed by openHAB’s voice recognition engine or an external service. Below is a basic example of how this could be implemented.

ESP32-S3 Code (Streaming Audio via Wi-Fi)

#include <WiFi.h>  
#include <driver/i2s.h>  

// Wi-Fi credentials  
const char* ssid = "YOUR_SSID";  
const char* password = "YOUR_PASSWORD";  
const int port = 12345;  

// I2S configuration  
#define I2S_WS 25  
#define I2S_SD 26  
#define I2S_SCK 27  

void setup() {  
    Serial.begin(115200);  

    // Connect to Wi-Fi  
    WiFi.begin(ssid, password);  
    while (WiFi.status() != WL_CONNECTED) {  
        delay(1000);  
        Serial.println("Connecting to WiFi...");  
    }  
    Serial.println("Connected to WiFi");  

    // Configure I2S  
    i2s_config_t i2s_config = {  
        .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),  
        .sample_rate = 44100,  
        .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,  
        .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,  
        .communication_format = I2S_COMM_FORMAT_I2S,  
        .dma_buf_count = 8,  
        .dma_buf_len = 512,  
        .use_apll = false  
    };  

    i2s_pin_config_t pin_config = {  
        .bck_io_num = I2S_SCK,  
        .ws_io_num = I2S_WS,  
        .data_out_num = I2S_PIN_NO_CHANGE,  
        .data_in_num = I2S_SD  
    };  

    i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);  
    i2s_set_pin(I2S_NUM_0, &pin_config);  
}  

void loop() {  
    WiFiClient client = WiFiServer(port).available();  

    if (client) {  
        Serial.println("Client connected");  
        uint8_t buffer[512];  
        size_t bytes_read;  

        while (client.connected()) {  
            i2s_read(I2S_NUM_0, buffer, sizeof(buffer), &bytes_read, portMAX_DELAY);  
            if (bytes_read > 0) {  
                client.write(buffer, bytes_read);  
            }  
        }  

        client.stop();  
        Serial.println("Client disconnected");  
    }  
}

Java Program (Receiving and Playing the Audio Stream)



import javax.sound.sampled.*;  
import java.io.BufferedInputStream;  
import java.io.InputStream;  
import java.net.ServerSocket;  
import java.net.Socket;  

public class AudioReceiver {  
    private static final int PORT = 12345;  
    private static final int BUFFER_SIZE = 512;  

    public static void main(String[] args) {  
        try (ServerSocket serverSocket = new ServerSocket(PORT)) {  
            System.out.println("Listening for connections on port " + PORT);  
            Socket socket = serverSocket.accept();  
            System.out.println("Client connected");  

            AudioFormat format = new AudioFormat(44100, 16, 1, true, true);  
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);  
            SourceDataLine audioLine = (SourceDataLine) AudioSystem.getLine(info);  
            audioLine.open(format);  
            audioLine.start();  

            InputStream input = new BufferedInputStream(socket.getInputStream());  
            byte[] buffer = new byte[BUFFER_SIZE];  

            int bytesRead;  
            while ((bytesRead = input.read(buffer)) != -1) {  
                audioLine.write(buffer, 0, bytesRead);  
            }  

            audioLine.drain();  
            audioLine.close();  
            socket.close();  
            System.out.println("Connection closed");  

        } catch (Exception e) {  
            e.printStackTrace();  
        }  
    }  
}

Key Benefits

Cost-Effective: ESP32-S3 is affordable and widely available.
Decentralized Capture: Audio can be captured in multiple rooms and processed centrally.
Scalability: Supports multiple devices streaming to one central hub.

Next Steps

For those interested, feel free to suggest improvements, contribute code, or discuss the technical feasibilit

I can help with the esp32s3 code @florian-h05

milo · December 17, 2024, 5:36am

@florian-h05 quick ping for me it’s unclear how to process here…let me know if you need anything…

florian-h05 · December 17, 2024, 7:15am

I would say we need to wait for the merge of [audio] Add pcm audio websocket with dialog support by GiviMAD · Pull Request #4032 · openhab/openhab-core · GitHub, then it can be worked on the ESP code.
I would suggest you subscribe to that PR on GitHub so you get notified once it is merged and I don’t have to remember telling you