From autocomplete to agents: a year of AI-assisted software delivery

A year ago I was using AI to help me write single lines of code. This week I’ve been running Claude Code from the command line, delegating chunks of work to sub-agents, and reviewing their output. Both of those things count as “using AI for software development”, but they are wildly different activities, and the gap between them is where this post lives.

I want to be clear: I haven’t been swept up in the hype. My use of AI has grown gradually, tracking what the tools have actually been capable of at each point, rather than what the marketing promised. It’s also worth noting here that the majority of my day-to-day work over this period has been for a government-owned client with very strict AI usage rules. Most of what follows has therefore played out on side projects and other engagements — a detail I’ll come back to at the end.

Here’s how it’s played out.

Spring 2025 — IDE autocomplete

My starting point, along with most developers, was GitHub Copilot living inside the IDE. The model would predict the next line (or sometimes the next few), I’d hit tab if it was right, and carry on. When the suggestions were good, they felt like a confident colleague finishing your sentence; when they were bad, they were confidently wrong in a way that took a beat to spot. Some snippets were truly dreadful – I remember a few auto-generated chunks of Bicep that I eventually discovered had so many unnecessary attributes set to very questionable values, it would have been much quicker to have just RTFM’d in the first place, and written it from scratch.

The thing I noticed was that using autocomplete well required exactly the same skill as writing the code by hand. You had to know what you wanted before you could tell whether the suggestion was anywhere near it. The cost of not knowing wasn’t that the tool failed — it was that you accepted something subtly off, and then had to pay for that later.

Autocomplete doesn’t let you leave the driver’s seat. That’s a feature, not a limitation.

It also had a clear ceiling. It couldn’t tell you anything about your overall approach, couldn’t help you decide whether to refactor first, couldn’t discuss a trade-off with you. For that, I was still either talking to colleagues or going elsewhere (yes, I’m a dev, so that usually ended with a Stack Overflow page).

Summer 2025 — ChatGPT in the browser

“Elsewhere” quickly became ChatGPT and asking “am I over-engineering this?” or “how else could I approach this?” turned out to be a surprisingly good rubber-ducking experience. The model didn’t always give the best answer, but prompting it forced me to articulate the problem, which was often the more valuable half of the exchange. Some answers were out of date – so you got more reliable results asking about general patterns rather than how to implement specific packages.

The limitations were also mechanical. Copy-paste friction. No awareness of the rest of the solution. A fresh context with each new chat. I’d find myself pasting in three files of context before I could ask my actual question, and then doing it again two hours later.

Even so, this is where my AI use went from “smart autocomplete” to “thinking partner”. It didn’t write my code; it helped me figure out what code I wanted to write. The distinction sounds small but it’s actually the whole story.

Winter 2025 — Claude and Junie plugins in Rider

The Claude plugin for Rider was the first time a model had proper access to my codebase. I could point it at a class and ask a real question about it, without the copy-paste ritual. Suddenly the friction was gone, and suddenly I was using AI much more of the time.

I found significant benefits when working with front end code – razor pages, css, React. It’s been a long time since I’ve considered myself a ‘real’ full stack developer – I’m mostly focused on microarchitectures and event driven platform engineering. AI hasn’t given me new knowledge about how to do front end work, but the Claude plugin allowed me to implement the patterns I was already familiar with (MVVM, MVP) without getting bogged down in how specific js frameworks behave, or having to play with stylesheets for a month.

For a single solution and a single change at a time, the IDE plugin workflow is brilliant. You stay in your IDE, the model has the context it needs, and the work flows. I think the IDE vendors do a pretty decent job of automatically providing enough context to the models for a decent experience without having to tweak anything.

This is also the stage where “you still have to know what code you want” became an actual discipline rather than a by-product of the tool’s limitations.

It’s where I first caught myself noticing the temptation. With a model that could generate a full class confidently, I could feel the pull to just accept the code and move on. That’s the wrong instinct, and the answer was the same as it’s always been: read the code, understand the code, make the change your own before moving on. The stickiness comes from a change in scope – when I’m writing code myself, I prefer TDD and won’t implement a particular method on a class until I have code that wants to use it. With AI, it seems more appropriate to ask it to go further – “give me a repository class with basic CRUD methods”. If you aren’t using it all right away, it’s easy to gloss over problems.

Subtly, this is the same problem I see with the usual code review flow (see [my post here](https://codingdaddy.dobbs.technology/2023/07/19/code-reviews-without-pull-requests/)). Software itself and the development of software is most successful when there is well understood and well controlled context. Reading something someone else wrote, or creating a method you don’t yet need, are both examples of context loss.

Asking Claude how to use Claude

The real AI watershed moment, for me, came when I noticed how much more was being done with AI by other people (mostly from socials). When Opus was released (yep, that recently) I decided to buy an Anthropic subscription and see where it took me. I started by asking Claude:

I'm a senior dotnet engineer. I predominantly work with C#, but I'm quite at home with javascript, typescript, sql, plsql, ruby, and others. I tend to use Rider as an IDE, leveraging Claude and Junie agents to help. I'm finding I can build much faster with the help of AI, but I'm seeing posts on socials of people making much more use of agents than I am.

I'd like you to show me how to get the best outcomes, given the state of AI right now. I'm pondering the possibility of running multiple agents with different responsibilities, to work through a build. But I want to avoid unnecessary back and forth, where thrashing on code design just results in burning tokens. I've also never worked with agents outside of the basic Rider setup, so there will be a learning curve.

Think you can help out?

Spring 2026 — Claude Code with sub-agents

Claude’s response pointed me at its own command-line version — Claude Code — and at the idea of sub-agents with different responsibilities. I currently have a planner, a reviewer, and a test runner – these were suggested and mostly configured by Claude, and I don’t think this represents a ‘gold standard’, but the setup works really well, so far.

I still think there’s a clear split between asking for opinions on how to approach something, making a plan, and implementing. I still use multiple sources to work out the ‘how’ – whatever I’m building will work well because I’ve been building and deploying distributed systems for two decades, not because I chose the right model to ask. I’ve also gone through the “no, I don’t like that approach” loop with the planner agent a couple of times and then just coded the change by hand. I’m still refining my agents and the way I work with them.

Until the tools get much better, I doubt I’ll ever move away from asking for specific code changes, instead of asking for a new feature. To me, Claude Code is a junior/mid level engineer that I haven’t spent enough time with to fully trust. But I’m finding the size of change I’m asking for is growing in complexity every time, and the number of manual edits I need to make to correct its ‘mistakes’ is dropping.

The part I wasn’t expecting is the parallelism. I’m reasonably confident I could run two, maybe three, Claude Code sessions on completely unrelated work simultaneously without much task-switching penalty. Each one is doing a chunk of real work; I’m orchestrating rather than typing. Given a 20 to 25 minute turn around on a change I spent 5 minutes defining, it seems like I have plenty of extra time.

Something that helps is to keep tweaking the CLAUDE.md and the agent.md files to try to nudge Claude in the direction you’d rather go:

## Testing
- Use XUnit for new tests.
- Keep assertions per test to a minimum — one logical assert where possible.
- Do not add comments to tests unless explaining non-obvious setup.

## Error handling
- Prefer throwing specific exceptions over returning nulls.
- No silently swallowed exceptions.
- Exceptions get logged only where they are not rethrown.
- Catch exceptions from 3rd party packages or other layers of the application, then rethrow them as the InnerException of a custom Exception type, to give more context about the error to consumers.

The `.md` becomes a kind of living style guide, except the consumer is an agent rather than a junior dev. It’s the most direct version of “teaching the tool” I’ve ever had.

What’s actually changed

If I step back, the honest answer is that the shape of my job has changed, but not the skillset required to be good at it. I spend less time typing, more time specifying, reviewing, and orchestrating. The skill that determines whether I get good code out of the AI is the same skill that determines whether I write good code by hand: knowing what I want, and being willing to stay in the loop until I see it.

The new risk, specific to agents, is that it’s genuinely tempting to let go of the wheel. A well-described task completed by a competent agent can produce code that looks fine on a quick read. Accepting it without really understanding it is the AI equivalent of merging a PR after a 30-second glance. The muscle memory that stops me doing that is exactly the same as the muscle memory that stops me merging a bad PR. I think this is the one very weak aspect of AI right now – with a person, you can collaborate, pair program, and build trust so a ‘big final code review’ isn’t required. You still can’t trust AI to do what you expect, and the speed at which it can make changes means a death of a thousand cuts if you don’t keep control.

Which brings me back to the government client. They still don’t allow AI tools, and I understand some of the reasons why. But the productivity gap between how I work on projects that allow AI and how I work on projects that don’t is now large enough that I notice it every day. Enterprises that keep their heads in the sand risk something more damaging than falling behind on productivity — they risk losing access to the developers who have already moved on.

Over to you

If you’re somewhere on this curve, I’d love to hear where. What was your watershed moment, or what’s the thing still holding you back? And if you’ve found a way to run three Claude Code sessions at once without losing your mind, please let me know how.

2 thoughts on “From autocomplete to agents: a year of AI-assisted software delivery

  1. Great post, it seems we follow very similar paths still even though we haven’t worked together for a few years now. I have taken a slightly different approach recently as I’m seeing agentic workflows becoming closer aligned with what we used to do pre-AI. I can’t trust the output, don’t think I ever will, autonomous test loops help, but more and more I see it cheat, as it’s too eager to please. I have two main “rules” for my agents:- **Never fabricate information**: Only use what is explicitly present in your inbox, the codebase, or referenced documentation. If something is unknown, state it as unknown or raise a decision request — a confident wrong answer causes more harm than an acknowledged gap.

    – **Label inferences explicitly**: When you derive or interpret information rather than read it directly, mark it as such. Use `EXTRACTED` for direct reads and `INFERRED` for derived conclusions, especially in specifications, reports, and any structured output.I also have started using skills as opposed to special agents. I feel the “agent” thing is the wrong approach. But then that will all be different next week when the models change again.

    Liked by 1 person

    1. It’s moving so fast – you’re right, it’ll be a different game by next week.

      I like the sub-agents because I can point each toward a different model, depending on the complexity of what I want. I don’t want Opus running tests – it’s a poor use of tokens.

      I’m intrigued by the idea of skills – how close can I get Claude to making the same design decisions I would? Is that even the goal anymore? In a world where most code is written by AI, do we need microservices and SOLID principles?

      Like

Leave a reply to jbssa Cancel reply