Writing specs for AI Agents

Feb 28, 2025

Writing specs for AI Agents

Product managers spent years complaining that engineers misread their specs. Ambiguity led to arguments, rework, and the occasional feature that bore no resemblance to what anyone had intended. Everyone agreed: specs should be clearer. Then AI coding agents arrived and started reading specs exactly as written, interpreting every sentence with perfect literalness and zero charitable assumption.

That turned out to be worse.

The old contract

For most of the history of product management, a spec was a conversation starter, not a build instruction. The PM wrote a document. The engineers read it. Where it was vague, they asked questions. Where it contradicted itself, they flagged it. Where it was incomplete, they filled in the gaps using shared context, team norms, and institutional memory.

I call this charitable interpretation. It was never written into any process document or onboarding guide. But it was the invisible layer that made product development work. Every successful product I shipped at Freshworks relied on it. My specs were clear enough, specific enough, and complete enough for a smart engineer who knew the product to build the right thing. But they were nowhere near complete enough for a reader who had no prior context, no memory of last quarter's decisions, and no ability to infer intent from tone or organisational culture.

They were written for collaborators. Not for compilers.

That distinction did not matter until the readers changed.

The moment everything surfaced

I watched a PM go through this realisation earlier this year. She had written a spec for a notification preferences feature, and by the standards we had all been trained on, it was solid. Clear user story. Well-defined acceptance criteria. Sensible scope. The kind of document any experienced engineer would have read, asked two or three clarifying questions about, and then built correctly.

She fed it to an AI coding agent instead.

The output was technically functional. Every acceptance criterion was met in the most literal sense imaginable. But the feature was wrong. Not subtly wrong. Visibly, almost comedically wrong, in a way that would have been funny if it had not cost three days of debugging and a missed sprint commitment.

The spec said users should be able to "manage their notification preferences." The AI built a single toggle: notifications on, notifications off. But that was it. Technically, that is managing preferences. The spec said the interface should "display current settings." The AI rendered every setting the system tracked, including internal flags the user was never meant to see, in an alphabetical list with no grouping. Technically, those are current settings.

Every line in the spec was honoured. Not a single line was understood.

At Boeing, years earlier, I had seen spec ambiguity cause problems with human engineers too. A requirements document for an internal tooling project described a data validation step as needing to "confirm accuracy." Two different engineering teams interpreted "accuracy" differently, one as format validation and the other as value-range checking. But the difference was caught in a review meeting within a week. Someone asked a question. Someone clarified. The charitable interpretation layer did its work. With an AI agent, there is no review meeting. There is no question. There is just the output, built to spec and built wrong.

Charitable interpretation debt

This is what I now think of as charitable interpretation debt: the accumulated distance between what a spec says and what the team understands it to mean, bridged entirely by human judgement and shared context. For years, this debt was invisible because humans were always the readers. The collaboration layer absorbed the imprecision. Engineers and PMs sat in the same standups, shared the same Slack channels, built the same mental model of what the product should become. The spec was a starting point. It was never the final instruction.

But when an AI agent becomes the reader, the debt comes due. All of it. At once.

Every assumption left unstated becomes a potential defect. Every instance of "they will know what I mean" becomes a feature that works exactly as written and nothing like what was intended. The spec has to carry the full weight of the decision now, because there is no collaboration layer to catch what falls through.

Ambiguity in a spec used to be a communication problem. Now it is a build defect.

The precision flip

I think of this shift as the precision flip. For twenty years, the best PM skill was knowing what to leave out of a spec. Brevity signalled respect. A forty-page PRD was the hallmark of a PM who did not trust their engineers. But now, what you leave out is what the AI gets wrong. The virtue flipped overnight. The instinct that made you effective last year is the instinct producing broken features this year.

The solution is not writing more. But that is what most PMs try first. I have seen them respond to this problem by adding pages of specification, trying to anticipate every possible interpretation of every possible instruction. That produces a different kind of failure: a document so dense that important decisions are buried under procedural detail, and the AI treats every instruction with equal weight because every instruction is presented with equal weight.

The skill that matters now is precision without volume. Saying exactly what you mean in exactly as many words as that requires. Not fewer. Not more. But the kind of precision that comes from having actually made the product decision before writing it down, rather than drafting something vague enough that the decision can be deferred to whoever builds it.

The spec was always the product decision. We just did not notice because humans were generous enough to fill in the gaps.

What the rewrite taught her

The PM whose notification feature went sideways rewrote her spec the following week. Same feature. Same intent. But this time, every ambiguous term was defined. "Manage preferences" meant a screen with six toggleable categories, each with an on/off state and a frequency selector offering daily, weekly, or never. "Display current settings" meant showing only user-facing preference states, grouped by category, with the current selection highlighted. The spec was not meaningfully longer. It was more precise.

The AI agent built it correctly on the first pass.

She told me afterwards that the exercise changed how she thought about product decisions. "I used to think the spec described the decision," she said. "Now I think the spec is the decision." That is not a small distinction. It is the entire shift.

I spent my career writing specs that left room for engineering judgement. That was the right approach for human collaborators. It signalled trust. It created space for engineering expertise to improve the product beyond what I had imagined. But that same approach, given to an AI agent, does not signal trust. It signals incompleteness. And the AI does not improve upon what you imagined. It executes what you specified. Nothing more. Nothing less. Nothing better.

The PMs who are adjusting fastest have recognised something that the rest of the field will absorb over the next year: the writing is the building now. If you have not decided precisely what "manage" means, what "display" includes, and what happens when there is nothing to show, you have not finished making the product decision. You have deferred it to a reader that cannot decide.

Product managers developed an instinct for productive ambiguity over many years, and that instinct was earned honestly. It served its purpose well. But the readers are changing, and the writing has to change with them. Not because the old way was wrong. Because the audience is different now, and writing for the audience you have is the oldest communication skill there is.

Enjoyed this article?

Get one practical product lesson every week. Join 1,200+ founders, PMs, and designers.