Review Is the New Bottleneck — For Everyone, Not Just Developers
Last month I made the case that review is the new bottleneck in software engineering. Once an agent writes the code, the scarce thing is no longer typing — it is the human who can read the output, trust it, and sign for it.
That was not a software story. It was the shape of all AI-augmented knowledge work, told through the one domain where the change arrived first and showed its math most clearly. The same shift has now reached the analyst, the lawyer, the marketer, the controller, the consultant, the product manager. Quietly, and without anyone rebudgeting for it.
The bottleneck moved for everyone. Most organisations are still measuring the old one.
The People Using AI Hardest Feel Worse, Not Better
Here is the paradox that should stop a board.
The knowledge workers using AI most intensively — the early adopters, the ones who were promised the most relief — report the opposite of relief. They feel they have more work, not less. They are losing the overview. The sense of being on top of their own desk is gone.
This is not a complaint about the tools. The tools work. It is the lived experience of people whose output went up five-fold and whose grip on that output went down.
I am one of them, and I should say so plainly. I love AI. I use it excessively — six sessions across two machines at once, on a normal working day. The leverage is real and I have no intention of giving it back. And it is not just me. My own team reports the same thing. My colleagues do. My customers do. The people closest to the work, the ones using it hardest, describe the same overwhelm in the same words.
The magic is seductive — once you are hooked on it, you cannot step away. That is not a criticism of the technology. It is an honest description of what the leverage does to you. But seductive things have side effects, and the side effects are worth naming out loud rather than smiling past.
Old: I produced the work, so I understood it.
New: The machine produced the work, and now I have to understand it well enough to put my name on it.
Those are not the same task. The second one is harder. And nobody scheduled time for it.
Generation Collapsed to Zero — So Volume Exploded
The mechanism is simple, which is why it is so widely missed.
Generation got cheap. A first draft of an email, a five-page analysis, a competitive summary, a board deck, a contract clause, a research brief — each of those used to cost an hour, a morning, a day. Now each costs a prompt and ninety seconds.
When the cost of producing a draft falls to near zero, people do not produce the same amount faster and go home. They produce five times as much. Five times the drafts, the analyses, the summaries, the slide variants, the email threads. The work expands to fill the new capacity — and every one of those artefacts now needs a human to read it, check it, and decide whether it is true.
So the person who used to be the author becomes the reviewer of a flood. The keyboard moved. The accountability did not.
As I wrote in the original essay:
Generation is cheap now. Review is the work.
That line was about senior engineers in May. It is about your entire knowledge workforce in June.
Why It Feels Heavier, Not Lighter
If generation got cheaper, the work should feel lighter. It feels heavier. There are three reasons, and they compound.
Reviewing is harder than producing. This is the part the productivity decks never model. As I put it in the original essay: to check AI output properly, the reviewer must understand the domain more deeply than the original author would have needed to. Writing a passable first draft requires competence. Catching where a confident, fluent, plausible draft is quietly wrong requires mastery. Generation distributes to the many. Review concentrates on the few who actually know.
Plausible-but-wrong is exhausting to catch. A bad human draft signals its own weakness — clumsy, hesitant, visibly thin. A bad AI draft does not. It is articulate, well-structured, and confident at exactly the points where it is fabricating. The reviewer cannot skim. Every paragraph could be the one with the invented number, the misread clause, the citation that does not exist. That is a different kind of tired than writing was.
The better the AI gets, the worse the oversight gets. This is the trap nobody wants to name. From the essay, verbatim:
at 95% accuracy, people stop reading carefully. The human in the loop becomes the human asleep at the wheel.
At ninety-five percent right, your brain learns that checking is usually wasted effort — and stops. The five percent that is wrong sails straight through, with your name on it.
So every heavy AI user lands in one of two places. They drown — reading everything, working longer than before AI arrived, because the verification load is real and unbudgeted. Or they rubber-stamp — waving the output through, shipping the five percent, and losing the overview because they stopped reading. Drowning, or asleep at the wheel. That is the “more work, less control” feeling, fully explained. It is not a mood. It is arithmetic.
The Loop Is the New Unit of Work
Listen to anyone using AI seriously in June 2026 and you will hear a new word. Not prompt. Loop.
I don’t write prompts anymore. I write loops.
A prompt is a request. A loop is a process. You set a goal, hand it to the agent, and it iterates toward the supposedly right answer on its own — it plans, it acts, it checks its own output, it refines, and it goes again. Each pass is meant to be better than the last. You are no longer asking for one thing. You are starting an engine that produces, judges, and produces again until it decides it is done.
This is genuinely powerful. It is also the reason review matters more now, not less. The loop did not remove the human verification step. It industrialised the thing the human now has to verify. Three dangers sit inside it, and a heavy user runs into all three.
A loop amplifies its own errors. A wrong assumption made early does not get caught by later passes — it gets inherited. The agent treats its own previous output as ground truth and builds on it. So the small misread in pass one becomes the foundation of pass two, and the load-bearing premise of pass five. The loop does not self-correct toward the truth. It self-reinforces toward whatever it started believing. Compounding works against you here, not for you.
A loop burns tokens and money when you do not know what you are doing. Every iteration is a paid call. An operator who has not set a stopping condition is paying for the agent to think in circles. From the original essay, on the FinOps reality of exactly this pattern:
A single careless
while trueagainst a frontier model can burn a five-figure invoice over a weekend.
The dashboard only registers it when the invoice lands on Monday. For a large programme that is a rounding error. For a small fixed-price engagement it is not survivable: the token bill can eat the entire margin before anyone has shipped a thing. The loop spends real money at machine speed, and it spends it whether or not it is making progress.
A finished loop still does not mean a correct result. This is the one that catches people who trust the process. The loop optimises toward what it judges to be correct. That is not the same as correct. It will iterate confidently toward an answer that is internally consistent, well-structured, fluent — and wrong, because the standard it is converging on is its own, not reality’s. The loop terminating is not a verdict on the output. It is a verdict on the loop. This is the trap’s sharpest edge: a loop that lands on a wrong answer is more dangerous than one that runs forever, because it does not stall or throw an error — it announces success. Nothing flags it. The result still has to be reviewed, conscientiously, by a human who knows the domain.
The loop, in other words, does not retire the reviewer. It hands the reviewer a stronger, more confident, faster-moving thing to be wrong about.
Engineers who build these loops for a living have started saying the same thing in their own vocabulary: a loop is a feedback controller, and the part that decides whether it is cheap or ruinous, stable or thrashing, trustworthy or quietly lying is the check inside it — the test that can fail the work while you are not watching. That check is the one piece you cannot hand to the AI. It encodes what “correct” means in your domain, and that lives in your head and your requirements, not in the model. The loop is the part that got automated. The check is the part that was always going to be yours. Which is review, named in engineering terms.
AI Did Not Fail — The Task Changed Category
The mistake is to read the overwhelm as a sign that AI underdelivered. It did not. It delivered exactly what was promised on the generation side. What it did was move the work from one category to another — from producing to verifying — and almost no organisation rebudgeted for the move.
This is the deeper pattern from my book, and AI is its sharpest test. Most “processes” inside companies were never processes. They were a series of workarounds held together by the people who knew which rules to ignore:
AI does not break your processes. It proves they were never processes.
A workflow that ran on a human silently catching their own errors does not survive the human being handed five times the volume to catch. The hidden review step — the one that lived inside the act of writing — got separated out, multiplied, and left unowned.
Which leads to the only question that resolves it. Also from the book, verbatim:
Who owns the AI-based outcome—by name, by value, and by when?
Not “who used AI for this.” A name on the final call. Last summer in Frankfurt I described AI as the ultimate assistant — der ultimative Assistent — that bridges knowledge and competence gaps and turns a person into a multiplier, Mitarbeiter x10. That is true. But a ten-fold multiplier on generation only pays off if review capacity scales with it. Multiply the output and not the oversight, and you have not built leverage. You have built a backlog of unverified work with someone’s name implicitly attached to all of it.
What To Actually Do — Review Is Now a Core Skill
The resolution is not “use AI less.” It is to treat review and verification as a first-class skill for every knowledge worker — not a developer concern, not an afterthought, a discipline you staff and measure. Five moves.
1. Decide what to trust and what to check. Not every artefact deserves the same scrutiny. A throwaway internal summary and a clause in a signed contract carry different blast radius. Risk-based review means deciding, deliberately and in advance, which output gets trusted on sight and which gets read line by line. The failure is treating everything the same — which collapses into rubber-stamping the things that mattered.
2. Budget review capacity, not just generation. The decks promise that AI frees up time. It frees up generation time and spends it on review. Plan for that. If a team is producing five times the output, the verification hours are a real, named line in the plan — not a gap someone absorbs by working later.
3. Make review-throughput the metric. Organisations still count output — drafts shipped, decks produced, tickets closed. That is now the cheap part, and counting it rewards exactly the flood that overwhelms people. The real capacity metric is review-throughput per person: how much output a human can actually stand behind. Measure what is checked, not what is generated.
4. Keep the human awake. Ninety-five-percent accuracy is precisely where attention dies, so design against it. Sample deliberately. Separate duties, so the person who prompted is not the only one who checks. Slow down on high-blast-radius work — the contract, the board number, the customer-facing claim — even when the draft looks finished. The point is not to distrust the machine. It is to keep a human genuinely in the loop instead of nominally in the loop.
5. Control the loop, not just the prompt. If the unit of work is now a loop, manage it as one. Start with the question that decides whether a loop is even the right tool: can you write a check that fails bad output without you in the room? If “done” is a judgement call with no test that can reject a bad result, you cannot close the loop — you are only automating the production of confident slop. If you can, set that check as the stopping criterion before you start, so the agent terminates on a standard you chose rather than one it invented. Then give it three brakes, not one: a step ceiling (stop after N passes), a budget ceiling (a hard token cap, so a frontier model cannot quietly think its way through your margin over a weekend — and because a single pass that spawns sub-agents can balloon on its own), and a no-progress trip (if the output stops changing, the loop is spinning, not working — kill it).
Underneath all five sits one shift in the human’s job. It moves from producing the first draft to owning the final call. The value is no longer in the typing — the machine took that. The value is in the judgment that decides what is true and what ships. Strengthen people for that work. Do not just flood them with output and call it productivity.
And the reason that final call cannot be skipped — the reason no loop, however confident, retires it — is accountability. You stand behind what comes out. You have to explain the pitch you had the agent generate, in the room, to people who will ask why. You have to defend and present the offer you had it write, in your own voice, as if you had reasoned every line. You are liable for the code when you ship a solution to a customer, no matter which model produced it. Which is why review is not a polish step you do if there is time. It is the act of accepting authorship. Review the loop’s output, conscientiously, because the moment it leaves your hands it is yours. As I wrote in the original essay:
The keyboard moved. The accountability did not.
That sentence was the whole point in software, and it is the whole point everywhere now. The agent generates. The loop iterates. The name on the outcome is still a human one, and that human had better have read it.
The Real Fix Is Invisible AI
Notice what every move so far asks of one person. Review better. Stay awake at ninety-five percent. Control the loop. Set the stopping criterion. Cap the tokens. Each one is sound, and each one loads the individual harder. We are answering “the operator is overwhelmed” with “operate more carefully.” That is necessary, and it is not the bottom of the problem.
The deeper craft is not about the doer. It is architectural.
Make the AI invisible. Stop handing every knowledge worker another tool to operate, prompt, loop and review on top of the job they already had. Embed the AI as a step inside the process instead — so the employee experiences a better process, not a second machine to babysit. From the book, verbatim:
The goal of execution is not to showcase AI. It is to make it boring.
This is also how the technology actually matures. When a system works reliably, we stop calling it AI at all — the AI effect that John McCarthy named decades ago: as soon as AI works, we stop calling it AI. It stops being “the AI project” and becomes the fraud detection, the demand forecast, the claims routing. It becomes, as I put it at DOAG in Nürnberg last November, the service layer of the IT architecture — not an app on every desk that each person has to drive by hand.
Now the sharpening, because invisible AI could be misread as the opposite of everything above. It is not.
Hide the friction — the prompting, the loops, the plumbing. Do not hide the accountability.
Those are two different things, and conflating them is the whole danger. The review does not disappear when the AI goes quiet. It concentrates. It moves up — off the desk of everyone who used to operate the tool, and onto the person who actually signs for the outcome. Invisible to the doer; owned and reviewed at the right layer. You take the loop out of a hundred inboxes and you put the verdict in one accountable pair of hands, deliberately, by design.
And invisible is not the same as secret. Where AI shapes a real decision — a credit score, a hiring shortlist, a customer-facing claim — transparency stays. That is not a preference; under the EU AI Act it is the law. Boring on the desk, traceable on the record. The friction vanishes for the user. The audit trail does not.
So the synthesis closes the loop the whole essay opened. The overwhelm came from one move: turning every knowledge worker into an AI operator. The resolution is two-sided. Design the AI invisibly into the process, so the doer is not buried under tools and prompts and passes. And keep ownership of the outcome deliberate and visible at the accountable layer, so the five percent still has a name on it.
Invisible to the many. Owned by the one.
The Verdict
The bottleneck moved, and it moved for everyone.
It moved from the developer’s keyboard to the analyst’s inbox. It moved from producing to verifying. It moved from a skill a few specialists trained to a skill every knowledge worker now needs.
Manage it deliberately — decide what to trust, budget the review, measure the throughput, control the loop, keep the human awake, make the AI invisible and keep the ownership visible — and AI becomes the multiplier it was sold as. Ignore it, and your best people drown in their own output or rubber-stamp their way past the five percent that costs you. The relief was never going to arrive on its own. It arrives when you own the new work instead of pretending the old metric still describes it.
I made the full case in software first, where the math is hardest to argue with.
The full analysis on iamyb.com → https://iamyb.com/essay/review-is-the-new-bottleneck-what-ai-actually-changed-in-software-engineering/