VS Code and Copilot free plan

First, I’d like to say that I’m really not at all convinced that we’ll save time by using “AI agents” when developing. I use no AI when I write Java, and I’m quite happy with that.

However, I’ve been making some JavaScript/TypeScript/Vue for an upcoming MainUI PR lately, and since these are languages that I’m not very familiar with, I typically don’t know the exact syntax for what I want to do. I’ve thus enabled Copilot free in VS Code to alleviate this. Even though most of the “suggestions” are bonkers and wrong, the constant flow of suggestion often tell me how to do something that I would otherwise have to look up the syntax for.

In that sense I consider it helpful, but it also gets in the way in that it constantly suggests bonkers things that I end up giving some thought “just in case they aren’t bonkers”. This is a drain that I don’t think people fully realize how much costs. Regardless, for a language I’m unfamiliar with, I think it comes out on the plus side in total, but I don’t consider it a large plus.

However, the free plan is a joke. I spent one month of “free suggestions” in a few days. It’s obvious, they just want people “to get a taste” and then pay for more. To me, paying for it is completely out of the question, I have no intention of financially supporting this “AI hype”.

It’s actually quite annoying the way it constantly “suggests”, especially when I know that all the BS it makes “eats my quota” that I could have use for when it’s something I actually need “help” with. But, it seems that suggestions are either on or off - I can’t only get a suggestion if I want one..? If that were the case, I think the free plan could go a long way for “casual coding in unfamiliar languages”. When you know the language properly, it only gets in the way, but you can only know so many languages at that level.

My question is: What do you people do? Do you just turn it off, do you manage with the free plan, do you use Copilot alternatives that lets you use your own LLM, or do you simply pay for it?

Try opencode, its model is not bad and free quota is decent. For pet projects I rely on claude (typescript is still a mystery to me), but to avoid constant interruption I bought entry level subscription.

Earlier I gave a try to Kimi, but its tooling wasn’t great. At work I use GPT and on-premises Qwen (2.5 and 3.5) which provide decent suggestions and hallucinate less often than GPT 5. For me LLM become a quicker way to search information, hence a massive decline of active users in stackoverflow.

I’m fiercely against everybody that wants to make money of this, so I will under no circumstances pay. I also won’t sign up for accounts if they are likely to nag me to “upgrade” to a paid plan. If I were to find an alternative, I would run a LLM locally (I have a 12GB GPU that hopefully would be able to run something useful). But, it’s not that beneficial IMO - it causes a lot of waste of time too, not to mention that you don’t have to learn/understand things as well, which probably means that the quality of your work will be poorer. But, when I have no intention of learning something properly, it’s tempting as a shortcut.

That’s one of the great pitfalls with this. “AI” can’t produce anything worthwhile unless humans have first shared the knowledge, so if people stop contributing knowledge, we will all suffer - “AI” users and non-“AI” users alike.

edit: I think the greatest reason for the reduction in SO use is most likely Google - because when you search for something, their “AI” usually is the top reply, and if that seems reasonable, you never get to the SO link. It’s a bit invasive that they do this actually, yet another reason to abandon Google.

Depending on the hardware you have, why not try a local model? I was very surprised with how good Gemma 4 actually is, and I’m running the 4b model on 10 year old hardware.
It’s not lightning speed but I easily read why it’s answering which is enough for most of my needs.

That’s absolutely a possibility. It’s a RTX 2080-TI, so it’s quite capable, and I’ve generated quite a lot of SDXL images with it. But, I just don’t know if I can be bothered to install/configure it. The benefit is quite marginal, and then it comes to getting to know what models that are capable/have good quality and not and all that. You can quickly waste a lot of time with those things.

I can do without it, I just hear so many mention that they use it for this and that, and I’m wondering how people manage to see this “as an asset”, given how little use I got out of Copilot before it wanted money. Not little in the amount of text it has generated, because it constantly generates BS, but little in terms of actually useful suggestions.

I mean - I get your point I truly do - but there’s many layers to this. System prompts and agents make a world of difference. Copilot and ChatGPT are also not really the best options right now. Gemma, Claude, qwen and deepseek seem to consistently out perform them in coding tasks.
All of this is still very new so I’d say, don’t waste time but instead invest it. Set it up and run it alongside what you are doing. Ollama makes it super easy to setup everything, and if you want to delve deeper into it you can try openclaw. But what I mean to say is that, think of it more like an adventure. We’re all figuring this out together.
(But do try Gemma 4, the 4B model surprised me a lot. But the Chinese models are fantastic for coding too.)
Edit: regarding the benefit being marginal, it depends on what you end up doing. A free, unlimited model you run yourself is tremendously more useful than a paid one that you used all of the tokens :confused: so it’s only marginal if you are not using the model.

Edit 2: for context, because I know it helps to understand also my point of view. I don’t know how to write code. Best I can do is edit, but very limited. I use Gemini and Gemma daily to write entire blocks of yaml for my esphome devices. I have pretty darn amazing sensors that I would never have been able to make without ai. Things that calculate the sunset time from my ntp server location , to then compare to the light sensor reading to output a message whether or not the amount of light is too much for the time of day. This is nothing complicated, I could easily do it in a rule - but I have it on the client itself, outputting it directly to OpenHAB. It’s pretty darn cool - and yes I know you will probably do it better but think about the 80% - 100% rule. That’s how you will save time :slight_smile:

When I’m talking about marginal benefit, I’m referring to when it’s “working”. The signal-to-noise ratio isn’t very good, which is why it also wastes a lot of time. I’m not actually sure if it saves time at all, but it’s alluring to get “something to work with” quickly instead of having to gather information first. But, by the time you have wasted time figuring out all it has claimed/suggested that is wrong, it’s hard to say how much time you’ve saved.

I’d say that it seems to me to be best suited to a task you’re pretty poor at, but not completely hopeless. If you’re good at it, it’s just quicker to do it right from the start. If you’re completely clueless, you won’t know enough to realize when it’s misleading you, so you need a certain knowledge “as protection”.

A hearing aid might be an analogy. If you have good hearing an aid would be a nuisance. If you don’t hear so well it may help you be part of the conversation and stave off isolation.

My AI thinking was to learn what it was all about and do some programming, hopefully to help others. My formal training 50 years ago was Fortran, so I started pretty basic. My OH rules are still DSL, which I seemed to understand with help on the forum.

All my OH rules are pre-AI and I had a few PRs (in Java) before AI, but AI seems to expand the possibilities. Also, it appears at least some OH maintainers are using Github Copilot to review PRs, (so when in Rome…). I don’t like when mistakes are found in my code.

In VSC I do find the fill in the blank aspect irritating and often not what I was typing. The agents tend to be pretty conservative with checks and tests. I have had to on occasion to eliminate what it was suggesting. I do review for understanding and test everything, so that doesn’t save time

My plan was to try the paid $10 a month for a while. The free level is not enough. I did notice that OH developers might get that tier free, but I’m not German and not big contributor. I can afford it for the learning.

I did try a vibe code to compare a heat pump with a furnace. It was kind of neat.

I think that the Copilot reviews are very useful. That’s a very different situation than when it is tasked with “creating”. It does make mistakes when reviewing, but in my experience, not often enough to be a burden in total. It does find a lot of small stuff that humans wouldn’t, because paying attention to all those details are so mind-numbingly boring that humans just “fade out” after doing it for a little while. Sometimes it’s “lucky” and points out larger things also, but I really think that it shines at finding those small things. Its greatest strength here isn’t that it’s so smart, but that it doesn’t get bored :wink:

I don’t dislike when errors I’ve made are pointed out. In fact, I appreciate it, because what I really hate is that I make errors that aren’t found, and have real impact. So, for every thing a review reveals that I can remedy, I’m grateful and pleased.

On occasion? I find that, most of the time, it does the wrong thing, and it’s usually only when I’ve almost completed it myself, that it finally managed to suggest something sane/useful. So, to me, it’s quite the annoyance having all that drivel constantly injected into the text I’m working with. Except, if I don’t know what to write, then I can look at all the crazy it suggests, and sometimes pick up enough information to know what I need to do.

It’s also quite capable when I have to do “mass edits”. I mean, when I have to repeat the same kind of change over and over again. After I’ve done it 4–5 times, chances are that it manages to suggest the next edit correctly, which does save time. But, I generally don’t do “mass edits” on a large scale, I usually try to plan before I do instead :wink:

Remember that these days, your most important “vote” is your wallet. My philosophy is: Don’t support anything unless you agree with what they are doing. It’s not about what I can afford, that’s not actually a part of the calculus at all for me. I know that if we reward them for this, they will keep increasing the prices and introducing more restrictions. They want to get people “addicted” and then squeeze them for all they can. I’m not about to reward that kind of behavior.

For now I turn that off. But I’m slowly building up LLMs on my machines and intend to experiment with FOSS models. I suspect VS Code has plugins to support either ollama or OpenAI that can be pointed at your LLM instead of a cloud service.

I can’t speak for how well they perform compared to the cloud models but I suspect they are about as good as the cloud models were six months ago, which is pretty darn good.

I’ve not done this for coding yet though. I’ve only used it for recipes (i.e take a photo of a cookbook, grandma’s had written recipe cards, screen shots from some paying to reddit, etc) and OCR and parse out the ingredients and steps and load into Mealie. I can say my current LLM setup works as well as the free tier of ChatGPT.

I’ve also set up LLM with Paperless-ngx. It’s currently chewing through almost 6000 documents, doing OCR and autobcategorization. I can’t compare this to a cloud AI because I’m not sending my important documents to the cloud under any circumstances. But with the LLM it’s more than good enough and it even handles the obnoxiously graphic and colorful bills that traditional OCR chokes on.

I’m running on a Framework 16 with an AMD RX 7700S with 8 GB GRAM (main machine has 32-GB DRAM). For detailing with imagery (e.g. OCR) it definitely hits the CPU and takes about a minute per page. But text based operations (e.g. processing the OCR text to find the title, dates, type, and tags for a document) barely register on the laptop in terms of ram and CPU and they take one to two seconds.

I suspect coding operations to be similar to the later. In my experience, you have to experiment with the models and the prompts to get good results.

All that is to say a 12GB GPU is more than enough to run small and mid sized models (when looking at the model names, the number at the end is it’s size). What you can room will depend on both GPU ram and main ram though to YMMV.

There are extensions to remove the AI results from search results. Or you can use a self hosted search aggregator like SearXNG which also removes the AI results.

Mainly because I’ve been able to do things with it’s help in a few hours that would have taken me hundreds to thousands of hours to do on my own.

I did use Gemini to help configure my home network better which saved me hundreds of hours in trial and error to get it right (see my two must recent tutorials). It had it’s place and it is incredibly useful. It’s also incredibly powerful which means it’s also dangerous and not enough care is being taken.

But so far, it’s been a huge asset for me.

Where that line is will vary from person to person and project to project. For me, I can unequivocally say it’s saved me hundreds of hours of research and trial and error. But it definitely lead me astray more than once.

Absolutely this. But some of the AIs are actually pretty good teachers. I’ve even had Gemini generate interactive graphic to explain certain concepts to me (which I double checked the references for of course).

It’s a tool like any other and any use if a tool without knowledge is dangerous.

When I dabbled with image generation, the local models were at least as good as the cloud models - better in many ways, because you can configure them like you want, add finetunes, LORAs etc. that suits your purpose. And you can run large models if you accept that it takes time, at least to some extent. I have 64GB RAM on that machine, and I’ve configured the video driver to be able to “borrow” RAM from the main RAM when needed. Thus, I’ve worked with things that consume up to 24GB memory. But, transferring the data back and forth between GPU VRAM and RAM is slow, and the GPU can’t actually access the data in RAM; so there’s a very significant slowdown once you start working with models that don’t fit in the GPU VRAM.

With images this is sort of acceptable, because you can work with it with a lower quality version of the model until you’re happy with the result, and then do the final generation using the very slow high quality model. Unfortunately, the result isn’t simply a “better quality” version of what you had, it can differ significantly, but still you can get some way by working this way.

That’s not really a good option when working e.g. with code. You don’t want “a lower quality version” first, so I guess you’ll have to find a model size where you can live with the speed/quality equation, and then use that. I haven’t tried doing this, so I don’t really know how well it would work.

Yes, I can also just scroll by them, but they often are a shortcut to find what you’re after. I was just pointing out that the fact that they do this probably kills a lot of traffic that would otherwise go to e.g. SO, but if that leads to people stop sharing knowledge on sites like SO, the "AI"s will also suffer, because they need that information to have something regurgitate.

I was referring exclusively about “coding help” here - I already think that e.g. code review is quite useful, and I can definitely see that there are things that can be done much faster than doing it manually in various areas, although I haven’t really had (or seen) the need yet.

I can only start to imagine how complex your network must be. For me, it’s quite simple and nothing I “need help with” (a few subnets, a firewall that coordinates/routes and controls traffic, and a few site-to-site VPN tunnels).

Again, I was referring to “coding help” specifically, but I guess that this will still apply. The project and the person will probably largely dictate the benefit. What I see is that it has a tendency to “fool me” into listening to it because it sounds like it knows what it’s talking about. By the time I realize that it really has no clue, I’ve already wasted a lot of time, but usually also learned something. When the dust settles, it’s hard to estimate how much time the process would have taken if I ignored it from the start. I suspect that I’ve sometimes spent significantly more time than I would have if I just ignored it from the start, but I’ll never “know” this for sure, since once you know, you can’t really try to figure it out another way to compare.

They can be good at explaining/teaching if what they think they know is factual. But, it turns out that it often isn’t, and if they “teach you” things that are in fact wrong, they don’t do you any favors. I’m also pretty sure that a lot of people don’t have the necessary dicipline to double-check their claims, and sometimes it can even be hard to do (not all information is that easily accessible).

It’s mostly complex because I need to dual stack ipv4 and ipv6 for matter support but I also want network based parental controls and ad/malware blocking and dual horizon DNS so I can use proper URLs with a trusted certificate from LetsEncrypt with my self hosted services without any traffic leaving my lan.

None of that is really all that hard but I’m not a network engineer so I didn’t know everything I needed to know to make it work properly. I could have learned it the way I usually do by reading tutorials and YouTube videos and docs. But I was much more effective and efficient with the aide of Gemini. It was particularly helpful to get summaries of YouTube videos. I hate watching tutorials on YT.

But now I have DHCP working properly (I’m probably going to abandon statically assigned IPs at some point it works so well). I have parental and malware blocking on all devices on my network. Parent devices have exemptions to some of the blocking rules and IOT devices have stricter controls. No IPv6 traffic leaves my LAN but I can still support Matter. And all services are available with a normal https address (no port numbers needed) with a cert issued by a trusted CA. So family can, for example use https://openHAB.mydomaon.com and it works without warning.

When working with LLMs and such, this is where promt engineering comes in. Adding a little bit like “do not guess at solutions and only offer suggestions you can find in a reference document. Please provide the link to the reference document.” can dramatically change the results you get. And it gets weird as some models work better with short and to the point prompts and go haywire if you are polite, others need to be threatened. Others work well with politeness. it’s a strage new world.

Gemini at least provides links to the source material it uses so you can judge if what it’s teaching is correct. But it’s also been specially optimized for this use case so when it is is teaching mode it will error on the side of being correct with references over hallucinating.

It’s notebook feature is also really good. With this you load up the source materials you want to base the responses on and the AI will only use those instead of reaching out to a web search or making stuff up.

A lot of people don’t pay attention when driving a car or in choosing what news media to consume. You can take a horse to water but you can’t make them drink.

But my point over all is most of these generic models are pretty awful out of the box. They are too obsequious and too willing to let you go down the XY Problem path. And they do guess at a lot. Often they are reasonable guesses but they are guesses all the same.

But there are ways to get these to work the way you want them to with a little effort. Most people won’t put in the effort, but that’s on them and not necessarily the tool’s fault. Any tool poorly used will produce poor results.

I would have expected CoPilot to be a bit better than reported here and when I did mess with it a little for some ansible playbooks it was OK. But I decided I wanted to go all local. I want to know just a little more about how it all works. So I’ve stopped experimenting with CoPilot.

Don’t get me wrong, not all suggestions are wrong. Some of the problem is in how it’s made as well, when you start typing something, it doesn’t ask you what you want to achieve, it just starts guessing without any context except what’s already in the document. A human would have a terrible “hit rate” at that too, but crucially, wouldn’t just “take over” and try to complete your sentences with no idea what your goal is.

This is why it gets “better” the closer to completion you are, it has more context to guide it. At the same time, all the disturbance it has caused along the way isn’t “without cost”, especially when this rambling eats up your “free tokens”. When the options you have are “on” or “off”, the result is in many ways given. At the same time, it’s obvious that Microsoft thinks that what “this process” provides is a good incentive for people to pay for it. The problem might be more with Microsoft than Copilot here, if they had laid the foundation with a better “interaction model”, the suggestions might have been better as a result.

It sounds more like CoPilot’s UI is awful more than anything. But I would guess there has to be a less granular way to invoke and use it in VSCode. Rather than helping line by line more “build a function to do x taking arguments y and z” which you can then refine as necessary.

I’m my limited experience AI does a much better job given a bigger block of work to do over trying to suggest as you type.

There are two “independent channels”:

“Inline suggestions” is just that, it just tries to finish whatever you type by filling it in “semi transparent”, and if you press Tab, you “accept”. The problem with this is partially that it’s distracting, partially that you lose track (or at least I do) of where you actually are. If you never press Tab, it won’t be a part of the document, but it also means that you must “imagine” how the document would look without its “suggestions” and just navigate from that. But, sometimes you want to use Tab, and I’ve had some “accidents” with that.

“Chat messages” are what you’re thinking of, where you can write “a prompt”. It can also make suggestions/edits in your code, but they are presented completely differently (like a diff) and you can accept or reject them. “Chat” is more useful if you want to figure out how to do something, although I bet I’ve used more than half my quota on “Why does VS Code do this?”, “Why did that just happen?”, “How do I prevent VS Code from doing X?”, “Why is that red”, “How can I actually find this information in VS Code” etc.

But, when working with a language I don’t know, it’s often too much hassle (and results in too many “long journeys”) to ask the “chat” for the syntax. Because, I often know what I want to do, just not exactly how the unfamiliar langauge wants that expressed. The suggestions, while also distracting and confusing, can be quite helpful at the syntax part, because you can just start typing something in the direction you want to go, and pay attention to what it suggests along the way, and you might just pick up the syntax you’re after. It’s not a very “targeted” or efficient process, but still perhaps what I find “the most useful” - yet annoying.

It doesn’t seem like the “two channels” are connected, one doesn’t know what the other is doing. If this was different, it might be possible to “guide it” into doing less completely irrelevant stuff.

My point is that’s probably too small. I wouldn’t ask in a chat specific syntax. Of have it generate a bunch of code and review what it did, maybe asking for clarification or telling to l it where it went wrong.

In my experience, if you are using AI as a kind of turbo charged auto complete or ask it something too simple, the AI will invent more work for itself to do to “help”. And it’s usually not that productive when it sits this.

But if you give it a bigger job to begin with it has less room to get “bored” I guess. At least that’s my experience.

So instead of typing and watching the suggestions or asking small questions about syntax, I’d try asking it to code up exactly what you want to do in total. Then review that block of code and refine from there. That should give you a bunch of subtract in a context you understand.

But again, I’ve limited experience with CoPilot and I’m assuming it works like the models I’ve worked with more.

I actually wouldn’t use CoPilot to ask questions about VSCode. I can get similar quality or better answers without it using up my tokens in the AI box on any of the search engines or the free tier of ChatGPT et al. I’d save CoPilot for coding problems only.

I’m far from an expert in this stuff. But those are the lessons I’ve learned so far on how to beat use these tools.

That’s rarely the situation I have. I usually have something that I want to enhance, modify, improve in some way, not blank sheets for it to “populate”. And if I had, I would be extremely skeptical of the quality of the “design” in made for the overall project. It would probably be a big task to verify that the “logic structure” it had made was sound, potentially larger than designing the structure myself in the first place. But, it’s hard to know in advance. I’m not against letting it “suggest a model” and then use that as a starting point for my own thoughts. But, I would never just “trust” that what it had come up with was reasonable.

As I’ve said, the place where I’ve found it helpful within VS Code is to help me figure out the syntax quicker than I would achieve by having to look everything up. But, it’s not without its frustrations.

It’s not so “strange” why it’s better at presenting a full model than to modify something existing. When constructing something from scratch, it can use what is has “learned” from people with knowledge in an area without “understanding” too much of the details. It more or less just “repeats what it has been taught”. Modifying something existing is in many ways more difficult, you must first “understand” what is there, analyze its goals and weaknesses, and then suggest improvements from there.

I haven’t really been “trying to save tokens”. I’ve used Google for quite a few things as well, but it has some annoying limitations, like that you can only paste a limited amount of text. So, it’s very hard to “share” the actual code with it. And, I’ve gotten it “stuck” over and over again. After some rounds where I’ve pointed out why this and that doesn’t work, and rephrased the problem description to the best of my ability, if stops responding. After a while a might get some generic “We’re currently experiencing problems” or just “Something went wrong - try again later”. To me, it seems like it “crashes”. I haven’t yet had Copilot do that, but that might be a coincidence.

But, the major benefit is that Copilot has access to the code in your IDE, you don’t have to keep trying to paste stuff that gets cut off, where it then responds as if the snippet it got is the whole thing. It can search through the whole project for uses of something for example, or how similar things are done other places. I guess this is the same reason why I’ve asked it about VS Code, assuming that it has access to information so that I don’t have to feed it everything. It’s certainly not “the most efficient use” of the “free tokens”, but as I said, I haven’t been trying to “save” either. I still have “chat time” left, and it will reset in 4–5 days. But, the suggestions have been “out” for quite some time now.

I don’t use Copilot except for the Copilot PR review functionality on GitHub. Gave it a few tries several months ago when the Student Pro Tier had way higher limits than today, and it really sucked in it alone and when comparing it to Gemini CLI (I get Google AI Pro for free as a student).
When coding, all suggestions I get are the ones from IntelliJ, which IIRC uses a specialised local AI model for that. They are okay and not annoying such as Copilot often is to me.

In general, I think I have these major use cases for AI agents:

  • Code Review: Copilot PR Reviewer as well as my self-hosted tooling that runs Gemini CLI internally.
  • Code Exploration: When working on parts of code I’ve never touched before, AI is really helpful in reading a ton of code, and creating diagrams and other stuff explaining how things come together. This really saves me a lot of time.
  • Investigating Bugs: When I have a bug report, I often let Gemini CLI investigate it, it often finds something helpful. This needs GitHub MCP to read issues.
  • Coding: I regularly generate code, but I usually have a very clear idea what I want before starting off, then describe that idea and let the AI generate a plan, give feedback on that plan and finally approve it. This allows me to work on more things in parallel than without AI, and do more, but I also regularly throw away what was generated and start over.

As I’m regularly on the train/bus travelling between my home town and my university town, it’s especially worth much that I can run/control AI agents from my phone, they run in a VM on my own infrastructure with their own GitHub account, so they can push to my forks and I can pull that later and continue on their work.

With all that usage above, I’ve never reached my limits in my free pro plan, I don’t really care about my usage. Even the 20$ a month would be worth it in my personal opinion.

If you want to give local models a try, the larger Gemma 4 models and Qwen Coder models are said to be quite good, wrt to the harness I would give both OpenCode and PI a try.

I’m so fed up with Google (blocking FOSS on Android and changing reCAPTCHA to require the use of their proprietary Google Play Services are the two latest in my mind, both completely unacceptable IMO) that I have a lot of resistance against taking anything new that is in any way affiliated with Google in use. I thus can’t get myself to try Gemma.

When you say Qwen Coder, are you thinking of model like for example this?

4 GB model files feels too small to get something useful out, but maybe that can’t be compared to the world of image generation that is “my reference” when it comes to model sizes?

Edit: When looking further at it, I am confused, there are 40 4 GB model files in “one model”. Are you supposed to use all of them to run one model? If so, the total size will be 160GB…

I haven’t looked that deep into which version and variant to use of it, but generally speaking better check ollama.com or lmstudio.ai for models, they also have the „inference“ software.

What’s on HuggingFace are usually the raw models, e.g. FP16. I can imagine that they are split into several files. For inference, Q8 and Q4 variants are way more interesting, these are for example on Ollama: qwen3.5