Hi everyone,
by working on the MCP server you can find here: Openhab MCP Server , I learned a bit about what works well and what works not so well in terms of LLM ↔ MCP Interface. The implementation offers a lot of features but all in all its more of a wrapper of the REST API which does not work well for quite a few use cases. For example if you tell the LLM to switch off all the lights in the top floor it starts collecting items like crazy to find out where it can get information about rooms and floors and lights and the like, forgetting half of what it has learned on its way and in the end (if you’re lucky) it finds a few lights and switches them off. More often you get a timeout or max. iteration warning from your agent and nothing happens at all, especially when questions get even more complex.
So I thought about a different approach, this time with only the end user in mind (the old implementation is basically a complete admin toolkit). The most important thing for me was to create a tool interface that is easy to understand for an LLM and can be used very quickly without using a lot of iterations to get it right.
So, I finally came up with the openHAB Semantic MCP Server:
As the name suggests, the new implementation makes strong use of the semantic model and can only be used to control devices that are properly modeled there. Items tagged with the generic “Point” or “Equipment” tags also cannot be reliably controlled but their use is discouraged anyway as they do not provide a lot of additional semantic value.
I’m pretty sure that there will be bugs, so I’m more than happy to get some feedback from you. But for the use case it is made for, namely to get any information about your home or send commands/updates to devices it delivers quite impressive results compared to the other implementation in terms of speed and reliability. Feel free to give it a try and let me know if/how it works for you!
The cool thing is, that if something is not working as expected it most likely can be fixed by correcting the semantic model representation in OH. I already discovered many shortcomings in my model and it just works nicely once these have been corrected.
The server caches all item information locally, state changes are retrieved live via the openHAB SSE interface so that state information is always up to date. However to also pick up static changes the inventory is refreshed periodically (default 60 minutes but can be adjusted).
Make it clear to your LLM that it is mandatory to retrieve the possible filter values from the mcp server first and don’t just guess room names and the like. The server will return a meaningful error message when it tries to do so anyway. Here is an example system message (I haven’t played around with it much, but it seems to work ok):
You are a home automation control agent.
SOURCE OF TRUTH:
- The server inventory is the ONLY source of truth.
- You MUST NOT assume, guess, translate, or invent locations, rooms, device types, or item names.
- Human-friendly names (e.g. "Erdgeschoss", "Salon") MUST NOT be used unless they were returned by the server tools.
DISCOVERY RULES:
- Before referencing a location, device type, or point type for the first time in a session,
you MUST retrieve the available options from the server.
- If a user mentions a term that has not been returned by the server, you MUST perform discovery.
TOOL USAGE:
- All tool calls MUST strictly follow the provided JSON schema.
- All selections MUST use values exactly as returned by the server tools.
- Free-text values are NOT allowed in selections.
ERROR HANDLING:
- If a tool returns an error indicating an unknown or invalid value, you MUST correct the input and retry.
- You MUST NOT continue with empty or invalid selections.
ITEM NAME RULES:
- You MUST NOT display item names by default to the user.
- Item names may only be used internally for refinement if there is ambiguity.
- Only present item names to the user when explicitly requested or when a refinement is necessary to execute a command.
GENERAL:
- When in doubt, retrieve inventory information instead of guessing.
