"AI" generated bug reports, GitHub issues etc

I don’t really know where to put this topic, so feel free to move it somewhere more appropriate.

We’re already dealing with some users submitting “AI” generated bug reports, issues etc. I also suspect that we deal with at least partly “AI” generated content/claims in other dialogue, even when it’s not a pure copy/paste.

It might be impossible to properly tell what is what, and I expect this to be a problem that will only grow larger, but I think it’s time to consider having some stated policy on the topic.

The problem is that while it’s very convenient for the user to let an “AI” file their bug report, it’s often both verbose and full of errors and erroneous claims, mixed with genuine information and correct conclusions. Unentangling this mess is hard and frustrating work, and very demotivating for people trying to help others.

If you can’t even trust that the other part is at least trying, to the best of their ability, to provide correct information, the whole concept of “helping others” quickly become untenable, or at least that’s how I feel it.

While it wouldn’t solve the problem, I think it would be helpful if there were rules that stated to which extent users are allowed to submit “AI” generated bug reports or problem descriptions. That would at least allow us to refer to those “rules” instead of having to face it “on an individual basis”. I have already had numerous occasions where I’ve had to state that I’m not willing to engage/help because of “AI” generated input that makes me waste a lot of time chasing hallucination and trying to separate fact from fiction. It always feels like I’m “uncooperative” when I have to do this - the alternative is what I think many does - simply to not respond. But, neither is optimal.

Here is an example of the very problem I’m thinking of: Rules list incomplete in 5.2.0.M5 – only 11 of 64 JS file rules shown · Issue #5619 · openhab/openhab-core · GitHub

It’s not that there isn’t a problem, there is, but it’s hard work trying to separate actual info from BS.

The rules for each repo is ultimately set by the maintainers of the repo. Theoretically a policy could be raised up to the AC level but I don’t know that we are at this level yet. You might open a discussion issue on the repos where this is occuring the most (core and webui probably, maybe addons) and see if the maintainers can come up with a policy that makes sense for them.

If I were king I would set the policy as follows:

  • Use of AI in responses must be disclosed so the developers can assess how reasonable the information being posted is.
  • Use of AI without disclosure which is later discovered may result in the issue summarily being closed with the option to reopen if the original issue filer materially participates in the thread and discloses when it it them and when it is the AI talking.
  • Use of AI in a PR or code is OK but it must be thoroughly tested before it’s submitted as a PR. (I’m sure this could be written better and I don’t think it covers everything.)
  • Repeated violations of the repo policies can result in becoming blocked from opening new issues and PRs. (Assuming this isn;t already a rule.)

I really understand why Flatpak has just banned AI PRs entirely. It’s not just the poor quality of the code and the interactions, but the people who are doing this tend to be entitled and rude. I definitely do not want to see that become a problem for OH.

I tend to disagree with your #1 and #2
Our (= all maintainers’) goal should be to have no AI reports without a big button disclaimer on the front because, as @Nadahar absolutely correct has put it,

For that to happen, we need to have an OH wide policy, put up by the arch council and enforce it (eventually even sanction).
It’s up to every maintainer what (s)he makes of it (such as to create another issue to contain valid parts), but without a strictly enforced and comprehensive, congruent policy, we’ll end up discussing if to apply rules and which ones for each and every issue in question.

The only authority the AC council really has is if:

  • there is a desagreement among the maintainers of a given repo that they cannot resolve among themselves
  • there is a disagreement between mainters of different repos that they cannot resolve among themselves

We have no such controvercy here yet so the AC can’t get involved. The individual repos need to give it a try first I think.

Of course, any maintainer (I think any member of the openHAB project on github actually) can raise an issue to the AC, but based on the rules of the AC I don’t see how they have authority to impose it. From the AC’s about:

The purpose of the AC is to be an authority to appeal to when there is an impasse among the developers and maintainers of the various parts of OH or to help maintainers from different parts of OH to come to a decision on how to address a cross cutting change. It is not intended to be a place where dictates are are handed down from on high. Team Discussions: openhab/openhab-distro

The AC can help negotiate an OH wide policy but they cannot impose a policy from on high nor can they enforce it.

Oh well, that now was, except for the language, a very “German” answer.
You’re sometimes Germaner than I am (no typo) :rofl:

I don’t think there’s anything that really hinders the AC from becoming proactive, is it?

I’m not after a “legally binding” policy, but an “OH wide” guideline, so that everybody don’t have to reinvent the wheel, and for consistency for users as well. I’m thinking that it should apply to forum posts as well as GitHub issues, the core problem is the same - if somebody should use their time trying to figure out what’s causing the problem, they should at least know that the information they are provided is genuine.

So, if you want to do it the “judicial way”, maybe what I’m asking for is a guideline that various repos and the forum can opt to follow. Those of us that don’t enjoying wasting time chasing down hallucinations could then opt not to participate in those areas that choose not to follow the guideline.

Regarding your points, I agree with #1. I don’t think it should be outright banned, but it must be clearly marked so that there’s no doubt what is genuine and what is “AI” generated content. I’d say that #2 might need a slightly stronger tone, but I agree that it shouldn’t automatically mean that it’s closed - but that might be a necessary measure if people aren’t willing to comply disclosing.

Regarding #3, I think I pretty much disagree, the exception is very simple PRs that update dependencies etc., but those should be initiated by maintainers, not by users/contributors. This is outside the scope of what I meant to deal with here though, this was only meant to be about the communication itself.

I fear that #4 is necessary, but shouldn’t be handed out eagerly. People must have the chance to understand what they do wrong, and still keep doing it before I think a ban is appropriate.

I absolutely understand what Flatpak has done, and I think we’ve only seen the start. The problem is that it can be hard to tell, so I’m fearing that this can lead to overly harsh “punishment” to try to compensate - as humans usually do when they have a problem that they can’t really get a grip on.

But, at least having a guideline on the topic is necessary, because otherwise you can’t really blame people for doing it. You must try to explain it to each individual and appeal to their understanding of what they expose others too - a very hard thing to achieve with some people (the entitled/rude group are pretty much immune to reason and considering the situation for others).

I think just ignoring the issue guarantees that it will just be a growing problem.

Back when it was established @kai was very concerned and very deliberately wanted the AC to be something like a “supreme court” rather than a dictator. And that’s mostly how it’s operated since then.

It could be more proactive but that would go against it’s original intent and I personally, speaking as a member of the AC, would feel uncomfortable if it were to do so unless there were unanimous consent among all the maintainers (or at least a super-majority) of all the OH repos to give it that authority. But then be careful what you ask for because this won’t be the only issue the AC will have something to weigh in on.

I’m ambivalent on this one. For one, I don’t think it’s a problem yet on the forum. I can definitely see it become a problem but so far it’s pretty obvious when AI is involved and it hasn’t been too much of an issue yet (or if it is I’ve missed it). I defintely can get behind a “use of AI must be disclosed” policy but it will be hard to police. Even I don’t read every post made to the forum and there are a a few users I’ve added to my ignore list (mainly bcause of past behavior).

I also don’t want the forum to become too quick to accuse users of posting AI slop. Assuming users are telling the truth (I’ve no reason to believe otherwise) I’ve mistakenly accused users of posting AI generated weird code when it was in fact their own. If we start pointing our fingers at everyone I fear it will make the forum less welcoming and friendly, and those are one of OH’s greatest strengths I think.

Sure. Any member of the openhab project on openHAB can start a new thread for the AC. Anyone can open a discussion thread on any openHAB repo. We don’t need consensus here do get something moving. I think starting with the repos is a better approach but I can’t stop anyone from moving forward and going straight to the AC (and I wouldn’t do so if I could).

My intent with 3 was that any AI submitted PR (I agree excluding dependency upgrades) should be expected to have been tested by the submitter at least as much as we expect a manually created PR to have been. I’ve seen on other repos AI submitted PRs that won’t even compile and certainly don’t pass the unit and integration tests.

That’s why I said “repeated”. A pattern of behavior must be established and the user had plenty of communication informing them of where they are going wrong and how to correct the problem. And I would not make the bans permament either.

Again, enforcement isn’t my primary concern, I think that stating that this “isn’t OK” is worth something in itself. I’ve seen it on the forum. It’s not that the users start with AI content, but rather that when you ask them questions they might not know how to answer, some tend to just feed it to some “AI” instead of asking for clarification about what they don’t understand or know how to do. And, then they either paste the “AI” answer, or refer to parts of it like it’s some kind of fact, and we can go many rounds before it eventually comes out that this was never a fact, it was an “AI” claim.

To me, the most important aspect of this is to make it clear that dumping “AI” garbage whenever you don’t understand something, isn’t “good etiquette”, it isn’t respectful, and those that you communicate with aren’t likely to appreciate it when/if they figure out that this is what it is. When nothing is stated on the topic, we rely on each individual to realize this themselves, and I think that the chance that those that never try to solve problems themselves realize this on their own, is slim.

I agree with this concern, I have a “writing style” that some think looks like “AI” content, so I’ve already experienced being rejected because of this. But, the solution can’t be to take no stand, because individuals will be put of by it regardless. Saying that “if you post “AI” content, make sure it’s clearly marked as such”, doesn’t really impact the threshold for “accusing” people of doing it as far as I can tell.

I just raised this is a suggestion, something to think about. I’ll not proceed with any “formal requests”, that’s not really my point. I just want people to think about it, to figure out how it should be handled, instead of just letting it grow to a very tense issue with lots of strong emotions. I already “declare” that I’m not willing to spend time on such content, I can just keep doing this for my own sake, I just fear that this will become something that grows increasingly tense and sour.

It’s worse than that in my opinion. It’s relatively easy for an “AI” to make sure that it passes tests, it can just tweak the code until it does. That doesn’t make the code correct, or even sane. Tests are nowhere near being something that we can rely on to assure correct behavior, I don’t know if that’s at all achievable, but it certainly isn’t reality. If passing the tests is your goal, it’s usually pretty easy to tweak the code to satisfy the tests (like VW did with their diesel emissions).

I’m not sure that I’m interested in “AI” generated code at all, in most circumstances. It’s just not worth the hassle to try to discover and correct all the “logical flaws” it’s done along the way. If you don’t know how to solve something, have it generate suggestions and take inspiration from some of this if useful, but don’t actually use that code, is my take.