Creation an voice audio satellite with the help of an Esp32

I also think that following the Home Assistant solution closely would be a great idea.
This way we can benefit from their client code, and avoiding fragmenting the young voice ecosystem.
I see at least three implementations to watch carefully:
1- the S3-BOX
2- The newly announced Home Assistant Voice (preview edition)
3- The wyoming protocol for custom remote satellite (such as the Raspberry PI) (I would like to do something about it for openHAB sooner or later)

I also wonder if 1 and 2 are based on the same protocol, or at least loosely ?