AI-assisted development at Kopperfield

June 15, 2026

By Adam Hultman · 9 min read

AIEngineering

AI-assisted development at Kopperfield cover image

If an engineering friend asked me how AI is going at Kopperfield, I’d give them the honest answer: better than I expected.

Yes, velocity is up. Yes, we mostly use Claude, wired into the systems we already work in through MCP servers. And yes, our CEO now ships real code while still running the company.

We’ve created practices that turn fast, AI-assisted contribution into work we can actually trust and merge.

At Kopperfield we build software for electrical contractors, where a wrong load calculation fails an inspection. The math has to be correct, and we cannot just wave AI output through. We have to treat it like mission-critical software. Here’s how that has played out so far.

AI made coding accessible but wasn’t shippable

AI didn’t just make our engineers faster. It opened the door for people who aren’t full-time engineers to contribute real changes. Our CEO has sharp product judgment. Suddenly he could turn an idea into a working branch in an afternoon.

That’s a real unlock, but it also became a source of pain.

Early AI-assisted work looked plausible, and the demo worked. But “close” and “ship-ready” are different states, and the gap between them is where the cost hides:

Unclear scope that crept past what the change needed to do
Architectural side effects nobody flagged up front
Missing tests, easy to skip when you’re moving fast
Weak handoff artifacts that left the reviewer reconstructing intent
Feedback arriving too late, after the finished pull request, when changing course costs the most

This produces a familiar, uncomfortable moment. The new contributor says, “I already built this and showed you. Why is it taking so long to merge?” From engineering’s side, the hard part was just starting: scope, tests, side effects, production risk. Closing that gap is where we most needed to iterate.

It started as a schema cheat sheet

A year ago, our entire AI setup was one file. It was called CLAUDE.md, it ran 195 lines, and its only job was helping Claude write better SQL. No agents. No automation. No plan. Just a list of our database tables with a one-line note on each.

One line stood out:

# Don't join the project table to load calculations unless it's the only way

We learned that clear context for the model really helps.

We got the upkeep wrong, though. The hand-written list drifted, and Claude started suggesting columns that no longer existed, because keeping a snapshot in sync by hand is a losing game, and we mostly didn’t.

So we stopped trying to keep a schema file in sync by hand. The real schema had been in the repo the whole time: thousands of lines of Django models that change with the application itself. The better move was to help Claude find and read those directly.

That required a map. In a repo this size, Claude could not reliably discover the right models, tools, and flows on its own. So we built docs that point it through the codebase: where the schema lives, where each tool’s code sits, and how the main data flows connect. With that map, Claude can use the application code as ground truth instead of grepping blindly or trusting a stale copy.

The old schema digest still exists, but only as quick orientation now, not the source of truth. And through MCP, Claude can go past the code entirely: it can query live data and inspect the systems around it without anyone pasting context into a chat.

Bottom line: writing down our world helped, but pointing the model at the parts of the system that already stay true is what kept it from rotting. That was step one toward shippable.

Specialists are easy to build, hard to trust

Our first subagent was a SQL helper for our internal reporting tool, a markdown file with a focused prompt Claude could lean on when a question turned into a query.

The lesson came from getting it wrong first. Our early specialists were padded with instructions on how to think about the craft, how to write good SQL, how to structure a query, and almost none of it mattered, because the model already knows that. The ones that earn their keep are lean: they don’t teach Claude its craft, they hand it our world and its edges, which tables are tenant-scoped, which mutations need a transaction, which files are legacy.

The harder lesson came from a specialist we built and then backed away from. We shipped a GitHub Action so anyone could mention @claude on a pull request and get a review. It worked, and it went nearly unused, because no one triggers a one-off, confused @claude run when they can just open a Claude tab and converse directly. We could have forced the issue and run it on every pull request automatically. We deliberately didn’t: the reviews weren’t good enough to sit in the path of a merge, and a weak reviewer that fires on everything doesn’t raise the bar, it just teaches engineers to ignore it.

That’s the line we’ve held since. A tool only earns a place in the critical path once its output is worth trusting there. Narrow specialists earned it quickly, because their job was small and checkable. Reliable code review took longer, and in the end it took a different reviewer than this one, which we’ll get to.

Tools fixed retrieval, not judgment

For a while, triaging a bug meant opening Sentry in the browser, copying the stack trace, pasting it into a Claude tab, and asking for help. Tedious, and easy to skip.

Then we connected our tools through MCP. Now you type a short command and Claude pulls the issue, the related events, and the breadcrumbs in one shot. It can query our task tracker, search our PR history, or check which accounts are on a trial, all without leaving the session.

This was the real turning point. Not any single integration, but the fact that Claude could suddenly do far more inside one session. A few days after we wired up all this context, a database migration slipped through with a locking risk none of our checks caught, and we rolled it back within minutes. Claude had reviewed the pull request and hadn’t flagged it, because the danger lived in runtime behaviour it couldn’t see, not in the code it could. Tools fix retrieval gaps. They don’t fix knowledge gaps.

That scare led straight to our first hard guardrail.

Some rules can’t depend on judgment

Not long after, we shipped our first hook, a small script called protect.sh. It blocks two things: edits to environment files and edits to migrations that have already shipped. When it blocks, it tells Claude exactly why, so the model redirects on its own without anyone stepping in.

It exists because of near-misses. Claude tried to edit a committed migration to “fix” a column type, which is one of the few truly unrecoverable moves in our stack. So we made it impossible.

The principle underneath it: a rule that depends on the model’s judgment isn’t a rule. It’s a preference. If a violation is recoverable, a preference is fine. If it means data loss, a security hole, or money moving the wrong way, you encode it as a hard gate. That’s also how we kept review from becoming personal. When a gate blocks a change, it’s not one engineer second-guessing another. It’s the same rule, every time, for everyone, including the CEO.

We started writing down our scars

By early 2026, our output was outpacing our review. Common mistakes from the coding agent were slowing down merges, and faster output without matching checks was creating debt.

The fix was an in-repo file we call LEARNINGS.md, plus a quick command to keep it fed. It captures what a fresh session would trip on: failure patterns from past sessions, non-obvious invariants in our own code, local environment gotchas, and workarounds for tooling that drifts out of sync.

We overdid it at first, logging trivial preferences like quote style until the file became noise. We trimmed hard. The test now is simple: would a fresh session trip on this without the entry? If not, delete it.

That file is the compounding loop. Every gotcha we write down is one the system catches next time, without a person having to remember it.

The same idea covers the rest of the docs. We have commands and a doc-updater agent to regenerate them as the code moves, on the same principle as the schema: don’t trust memory to keep a copy true. We don’t always win, docs still drift, but the closer a doc sits to something the system can regenerate, the less it rots.

The fixes that actually closed the gap

On the human side, we tightened how work moves between people:

Changes arrive with handoff context: what it does, what it touches, what it deliberately leaves alone.
We agree on scope early, before the agent runs too far in the wrong direction.
If a change is likely to move architecture, that’s a conversation, not a quick branch.
We give feedback while the work is in flight, not only at the finished PR.

On the engineering side, we let the system enforce the bar:

Lint rules that exposed and reduced real bugs, not just style nits.
Tests and passing checks as hard gates. Changes ship with tests, and those tests pass. No exception for speed.
AI review with the right reviewer. Claude wasn’t strong enough to give us reliable review signal, so we kept looking. Propel cleared the bar it couldn't, and it's the reviewer we trust for that signal now.
Human sign-off that still matters. A named engineer stands behind every merge.

Bottom line: the conversation fixes set expectations, and the gates held them. Together they turned “demo-ready” into “ship-ready” without making every quality call a personal one.

Leadership matters more, not less

Our engineering work has shifted. Less of it is typing code, more of it is helping the company use tools well: setting scope, building the guardrails, deciding what “shippable” means. And someone with strong product instincts but no traditional engineering background, our CEO among them, can now turn an idea from a customer conversation into a reviewed, tested change in days instead of a line item in a backlog. For a company our size, that’s the difference between shipping like a small team and shipping like a much larger one, without lowering the bar that keeps our customers’ numbers right.

Where it goes next follows the same line. The context layer is now rich enough that a terse instruction produces work we’d actually merge, because the model already has our schema, our conventions, and our scars in front of it. The end state we’re building toward is not AI that writes all the code. It’s a system that turns less and less human direction into more and more shippable change, while the judgment, what to build, what risk is acceptable, where the bar sits, stays with us.

If you’re trying to make AI work on a team that can’t afford to be wrong, our advice is that speed follows the safety, not the other way around.

Back to Blog List