Openhab-claude

Dear Community,

I want to share my openhab-claude repo as configuration for binding development for claude desktop app (tested) and in future also for open-webui (not fully tested).

Check readme for features. Basically it provides you

  • a complete development team with concept, architecture, development, quality and document writer roles
  • respects openHAB coding guidelines, review checklist, md checker and can be extended with your personal preferences
  • and more - see readme

Feedback highly appreciated.

My observations

I tested this setup using BrightSky API turning into a binding. Not feature complete, not bug free but note: not a single line is provided by myself! Code is here, binding jar is here.
Time to complete with vibe coding: ~3 hours

  1. Give $Concept the BrightSky API
    Result is CONCEPT.md
    Worked seamless - minutes - tokens moderate
  2. Give $Architect CONCEPT.md
    Result ARCHITECTURE.md plus some decision results
    Worked seamless - minutes - tokens moderate
  3. Let $Dev implement ARCHITECTURE.md
    Result java code
    Many rounds providing maven logs and fix loops - 2 hours - token killer
  4. Let $QA write Unit tests
    Result java test code
    Many rounds providing maven logs and fix loops - 1 hour - token killer
  5. Let $Writer fill out readme.md
    Seamless
  6. Let $Review make a final check
    Seamless

This really is going to be the future isn’t it? Yesterday I built an NTP server complete with jamming detection capabilities.
The next couple of years are going to be craaaazy.
Great work though, I just wish someone worked in a binding for locally ran models like from ollama or llama.cpp.
Perhaps a feature request? :wink:

You can already run Claude Code against local LLM.
And you have other options as well, if you’d rather use open source assistant instead of Claude. For example, opencode is compatible with claude AGENTS.md, skills, etc. So most of Bernd repo should be compatible out of the box.

Out of interest: how much did the whole process cost you? I guess this will get very expensive quite quickly!?

This might be a cool new approach for a bounty program: estimate the cost for a new binding and start development once enough people donated for it :+1:

You could consider contributing the skills and the claude.md as a .claude/ in the root of the repository. Would work out of the box if ran from the root.

I also have positive experience, even it could reverse engineer some values from a non-documented response based on the variations.

It’s useful but the cost may rise soon indeed. Let’s see…

As @dalgwen said you can include a local LLM into claude code.

If you want to be independet of such tools, with e.g. ollama and open-webui you’ve a long way to go because the harness (infrastructre) is missing by default. Websearch, document analysis, write to filesystem, maven execution, … needs to be setup by yourself.

The above prototype was completely developed inside my claude pro plan. This plan ensures you 40.000 tokens in 5 hour windows. The ~3 hours costs ~80% (32000) tokens. My guess it’s near to a release - unit tests, documentation, review are done, some bugs / improvements needs to be done.
With 1 or 2 more time windows should be release grade.

This was an easy prototype translating a REST API into a binding and channels. If other protocols are used (modbus, matter, …) for sure this can eat more tokens.

Sure, but I still have the whole openhab-addons structure in mind. Not each each binding shall have it’s own .claude directory. Review list, coding conventions and other context parameters can change so the maintenance can get out of control.

I must say that I feel like we’re on different planets, how do you judge that? By letting the “AI” evaluate what it has done, or by thoroughly testing the binding yourself?

I haven’t invested money into any paid plans or tried everything I come across of such tools, because what I’ve seen so far is so far from being able to write complex code that it just feels like we’re living in different realities. From what I’ve read, “Claude” is pretty much on par with everything else in that it can do really simple stuff quite well, but messes up as soon as there is a bit of complexity - but with the same “AI confidence” as always, so you’ll never know that it has messed up unless you check it yourself.

Are those that have used it and claims this just way off target? I can’t get things to line up here…

edit: I took a very quick look at the code, and it should be mentioned that this is an extremely simple “binding”, it just polls a web API. It can’t “do” anything. And, the refresh logic is made without any form of thread-safety, but that’s not uncommon among OH bindings, so it probably just copied this from elsewhere. Still, if it was at all “smart”, it should understand that it’s needed.

Ah I see I thought you were using an API account. Using the pro plan is better of course, this will give you more for your money :+1:

No, we are living on the same planet :slight_smile:
Regarding judgement: I’m the judge!

Perhaps There’s a misunderstanding that this repo enables claude to write bindings completely by ittsself! This is definitely not the case.

  • claude defines the CONCEPT.nd - I’ve to judge
  • claude defines ARCHITECTURE.md - I’ve to judge
  • claude writes the code - I’ve to review
  • I’ve to test the binding in the system

Before github kicks in

  • coding conventions out of the box
  • review out of the box
  • readme documentation out of the box
  • md file checking out of the box

What is your definition of complex code?. And honestly I don’t want to be offensive!
Example: Bringing my smart home into an energy management system is my passion. So what’s the power/energy profile and frequency of my washing machine, dishwasher, dryer?

So I gave claude the sql query of the past year

  • standby power: lower 10 percentile analyzed perfectly
  • short peaks in between - analyzed perfectly
  • general profile 80 percentile analyzed
  • frequency analyzed, of course in average

I think nobody used it so far. I can say it speed up my development. The first feature I used this was DIRIGERA binding to introduce a new feature. Instead of weeks getting an idea, analyzing implementing and testing it was just a few days.

Nobody said anything different!

Just note that these REST APIs are pushing valuable data into your smart home system like weather forecasts, renewable energy forecasts, solar forecasts, energy prices, BEV vehicle data.

I have never attempted to use a model for statistical analysis - they might be very capable at that for all I know. I’m talking about computer programming code with complex logic without making lots of hard-to-find bugs.

We must be talking about different things. I was referring to using “Claude” for computer programming. It sounds like you perhaps is thinking of the setup/framework you’ve come up with here to use with “Claude”? Because, obviously, I can’t read anything about people’s experience with that.

I don’t know quite how to “translate” that to my world: I rarely set out to make something, I “see a need” and then try to solve it - thus, the “idea” is already there. From there, implementing the functionality can often be quite quick, depending on course on the scope of the “feature”. I wrote an OH binding for controlling a brand of electric heaters from a local REST API available in the devices over Wi-Fi as my “first coding experiment” with OH. I took me a few days to figure out all the details for how to set everything up, create a binding, and all that. The actual coding of the brunt of the binding then took 2–3 days. All in all it took me about a month, where the majority of the time was spent testing, tweaking and fine-tuning. Only when that was done, did I finish “the formalia” like writing documentation, because I feel it’s wasteful to have to keep changing the documentaton as I change my mind.

If I had to do it again, I would be quicker on setting it up and on “aligning with OH” because those things are more familiar for me. But, I’m not so sure that the “basic formula” would be that different, 15-20% of the time might go to writing the logic itself, the rest on testing, finding areas with “potential for improvement” like improved user experience, simplifying use, and then implement that and more testing.

I could be wrong, but to me, it sounds like I would trade compressing that initial 15-20% perhaps in half with expanding the testing/tweaking phase by a lot. How much I don’t know, I don’t know if I would end up replacing almost all the initial code by the time I was done with the testing/tweaking phase, but I have a suspicion that a lot would have to be replaced. But, even if that’s not the case, I would have a huge disadvantage when testing/tweaking by not knowing the code intimately, which is a huge help in figuring out exactly why something happens, what to do about it, and what is and isn’t possible.

My “feeling” is that it would slow down the whole process significantly. You get to “something doing something similar to what you wanted” a bit quicker, and then spend much more time making it actually behave well. A key factor is of course what you consider “good enough”. I’m somewhat of a perfectionist, so I have a high threshold before I’m content. The lower that threshold, the more “advantageous” using the “AI route” might be, because by cutting down the testing/tweaking phase significantly, the part where you can save time becomes a larger part of the total.

Sure, I’m just thinking that many bindings are much more than that. Some of those “API polling” bindings are on the verge of what you could do using the HTTP binding. What would be very useful is if somebody made a “REST API binding” where you could configure the endpoints, define channels and define how endpoints would populate channel data. If that was done in a good way, it could probably be used for many such services, and users could share “the configuration” needed to use different services.

This was my example to identify a power profile. This works for me. Maybe don’t work for others but these errors need to be analyzed in detail.

Again, no offense, but this is too fuzzy for me. What do you mean by complex logic?

The setup is described in readme and it refers to the desktop app. You’ve to register but there’s a free plan. And of course nobody is using that because this is the post introducing this repo which is published 1st of June!

Please believe me; I read your whole passage below. But this first sentence is explaining it all. You don’t know what you want! Imagine you’re a Manager and your Employees shall do the job. What are your expectations? You need to state them! Doesn’t matter if they are human or AI!

I fully agree! But that’s your job! I provided a simple example from REST API towards channels. If you want more you need to address your requirements in a very clear way! Explain it to human!

Exactly what I explained. Computer code, in e.g. Java, C++ etc. that contains complex logic, must handle various concerns correctly and consistently with a “sound logic”. I’m afraid I can’t give you an example, there are plenty of software projects out there that utilize complex logic. It’s not about applying “advanced algorithms” or anything else you can read in a paper and apply blindly, but about keeping track of all the concerns needed correctly.

I obviously won’t test that, I never sign up for “free plans” - I’m tired of constantly being bated and harassed into paying.

This is just trying to pick an argument, I’m not interested in that. I know perfectly well what I want, and that’s not being a manager having employees make things for me.

But why would I do that? What’s the benefit? Why do all that extra work trying to communicate what I already know, to have a sub-par result that I end up having to rewrite anyway?

So basically, what bothers you is that AI-generated code is like a kid coloring your notes and saying, “Look dad, I can write like you.” It’s the same outcome, but a totally different reality.
​I believe the hardcore devs fighting this wave are losing this one. The problem is that nobody sees coding as an art—they just see the idea or the outcome.
​Eventually, these LLMs will have so much context capability that they will be indistinguishable from an expert human with 40 years of safe, proper coding habits.

This has nothing to do with art as I see it, but with quality. Code quality has kept going down for a long time, as “easier” ways to code has been made available (dumbed-down languages, “functional” programming etc.) The problem is that those details that are hidden still matters, so performance suffers - so we throw more money on it with faster hardware to compensate. But, it has another effect. It brings people into software development that aren’t really suited, they lack the correct mindset. So, quality goes down as well.

The only ones that benefit are those making the money, because as the task becomes “easier”, they can pay people less to do it - because they can hire from a much bigger pool. So, “capital wins”, everybody else loses. We’ve become used to it now, so we think it’s the way it should be. Software is shipped long before it’s ready, there’s basically no quality standard and “we can always just push an update should something really ugly appear”.

I see the whole “AI coding” thing as an extension of this “trend”. Quality will become even much worse, but “the capital” will get their crapware cheaper, so they don’t care. People will accept even lower quality as long as they do it gradually. So, it might “work” for those that want to maximize profit.

Does that mean that the rest of us should cheer it on? I consider myself “neutral” - I will use it if I see real benefit, but I won’t chase after every hype. I would love if the “coding agents” could make better quality stuff than they do, but I’m skeptical if LLMs will ever achieve that. Maybe, maybe not. It’s hard to predict.

What I am against is the constant lowering of standards, where we’re supposed to “get used to” that nothing ever really works. Because it’s unnecessary, and it’s only so because of greed.

It’s not exactly difficult to find examples of “AI code” being banned or heavily restricted these days. Could it be that some of these people actually have a point, or are they all just “grumpy all guys that don’t want to embrace the future”?

In short, if you’re interested, you can find lots of information that explains what the problems are. And, speculating about “how incredibly good they will have become in 5 or 10 years” is pointless as I see it. Nobody knows what will be - all we can judge is what we have now. And that, in many circumstances, cause more trouble than value.

You are absolutely right about corporate greed and the trend of lowering standards, but that is exactly why this shift is inevitable and why the hardcore devs are fighting a losing battle. Capital always wins, and it will always optimize for the outcome over the craft.
​You are judging the entire future of this technology based on what it is right now. Yes, Zig, Flathub, and others are banning AI contributions today because the current tools are still in their infancy and can generate garbage. But using today’s limitations to dismiss where this is heading is just sticking your head in the sand.
​When I mentioned that people don’t see coding as an “art,” I meant exactly what you are describing: that deep care for quality, performance, and the hidden details. The veterans care about that. The market doesn’t—it just wants the idea or the outcome.
​But here is where AI is fundamentally different from the “dumbed-down languages” or the influx of people who lack the correct mindset. Unlike humans, a mature AI won’t get lazy, cut corners, or demand a lower salary for lower effort. Once the context capabilities scale up, you won’t just get cheap code; you will get cheap code that instantly and flawlessly applies 40 years of safe, proper coding habits.
​The bans happening today are temporary band-aids. They won’t mean anything when the AI stops making the mistakes they are trying to block. The reality remains: the outcome is all the world cares about, and eventually, the LLM’s reality will be indistinguishable from an expert human’s.

You’re basing all this on an expectation of what will be. If you’re right, the situation will be very different that what it is now.

I’m not in the prophecy business though. All I know is what we have, not what be, so my conclusions are based on that. I fear that they might be “at their best”, quality wise, now. Not because I don’t think the models will improve, but because it looks like we’re approximately at the point where they will start “learning” from themselves. As of now, most of the “knowledge” out there is still human made. That is changing, and that means that their training data will take a nose-dive qualitywise. And this will probably just become worse and worse. “Slop recycled” doesn’t sound promising to me.

Back to the original topic: I think this is a great contribution with significant potential for openHAB.

Home Assistant has always had the advantage of quickly supporting even quite exotic devices, largely due to its relatively low barrier for creating new integrations.

openHAB always needed highly skilled developers with OSGi knowledge who are willing to take over the development of a binding.

Even if the quality of upcoming Claude-based bindings is not at a perfect level, they could still help grow and strengthen the community.

@weymann : So thanks for the approach. I’m going to give it a try.

I absolutely share the idea that this is a challenge for OH. But, if the quality is “bad enough”, it could just make things worse, we’ve seen how much “bad bindings” can lower the overall impression of OH. So, let’s hope the results are good enough to make things better, not worse.

Off-topic: There is quite a lot of “boilerplate” in simple bindings, some smart heads should figure out how all this could be taken care of so that binding authors just had to fill in the actual “logic” needed to handle a device.