Review Is the New Bottleneck — What AI Actually Changed in Software Engineering
The productivity decks are right about one number and wrong about the system around it. AI compressed the part of software that was already the cheapest. The discipline that ships defendable, operable, auditable code did not get cheaper — it got more visible. For the executive deciding what to build, what to buy, what to acquire, and what to staff, this is what the dashboard is missing in May 2026.
The number in every consulting deck this year is N× faster — 30%, 55%, 100×. All of them describe the typing step. None of them describe the work. Software engineering, broken into its actual disciplines — requirements, architecture, code, testing, security, deployment, operations, observability, documentation, evolution, accountability — has a typing step that is roughly ten to twenty percent of total effort. The other eighty percent did not change category. It changed visibility. The cost of skipping it shifted from invisible to immediate.
What Vibe-Coding Actually Is
Andrej Karpathy named the trend in February 2025: “give in to the vibes, embrace exponentials, forget that the code even exists.” Collins Dictionary made vibe coding its 2025 Word of the Year.
The credible coding-agent tools in May 2026 are Claude Code, OpenAI Codex, GitHub Copilot, and Amazon Kiro — AWS’s spec-driven agentic IDE, launched July 2025, that prompts the agent to write a spec first and only generates code after the human approves. The IDE that ends vibe coding, sold to people who paid for vibe coding twelve months ago.
In honest use, vibe-coding compresses the typing step. First drafts in minutes. Sprints from two-to-three weeks to 48 hours. Real — and a partial story.
What Vibe-Coding Does Not Replace
What professional software engineering still requires when an agent is producing the code:
- Requirements engineering. If the spec lives only in the prompt, you do not have requirements — you have improv.
- Architecture with a rationale. Three valid answers: the team can run it, the strategy needs it, legacy forces it. “The model thought it was a nice idea” is not on the list.
- Testing, security, CI/CD, observability, documentation. Coverage is now an AI-output verification problem. The prompt is not the documentation.
- Legal, data protection, accessibility. EU AI Act fit, personal-data scope, WCAG-grade accessibility tested with people who actually rely on assistive technology.
- Liability. Whose name is on the production deploy?
The discipline of engineering did not get cheaper when the typing got faster. The cost just moved.
The Brownfield Reality No One Puts on the Slide
The 55%, the 100×, “the senior who hasn’t written code since December” are greenfield numbers. In a grown brownfield enterprise — twenty years of acquisitions, a partially-documented monolith — the realistic productivity gain from a coding assistant is more like ten to twenty percent. The senior who has been on that system since 2010 can be woken at 02:00 and tell you which of three places the bug almost certainly is. The LLM cannot. The half-fixed-in-2014 race condition, the comment in German that says “DO NOT TOUCH”, the integration constraint nobody documented because everyone already knew — that knowledge lives in people, not in the repo.
Very large code bases cannot be fed to an agent. “Here are five million lines, understand them and modify” does not fit the context window — and even when it does, the agent has no model of the call graph or which modules are load-bearing. The practical answer is code graphs: symbolic indexing, dependency-graph extraction, static-analysis pipelines that pre-digest the codebase into chunks. Real engineering work. No prompt replaces it.
Architecture Is Still a Conscious Decision
The hardest part of an enterprise software decision is not what to type — it is what to choose. The implicit knowledge of what exists, who owns it, who signs for what, lives in the seniors who have been there for a decade. The decision is still the human’s. AI proposes, validates, challenges; it does not decide. The Solution Architect role becomes where the highest-leverage cognitive work happens. The same is true for UI and UX: generative agents produce a recognisable visual signature, so a designer is required for a product that needs to look like itself.
The McKinsey Lesson
On March 9, 2026, the red-team firm CodeWall hacked McKinsey’s internal AI assistant Lilli end-to-end — read and write — in two hours. The route in was banal: of more than 200 publicly documented API endpoints, 22 required no authentication at all; JSON field names concatenated into SQL queries without sanitisation. CodeWall reported access to 46.5M chat messages, 728k files, 57k user accounts, and the 95 system prompts that controlled Lilli’s behaviour — stored in the same database, so an attacker with write access could quietly rewrite how the assistant answered every consultant. McKinsey patched within 24 hours, no client-data access. Worth reading: CodeWall’s report; The Register.
Lilli was internal, not client-facing — fair caveat. The underlying point stands: the failure mode itself is exactly the one vibe-coded production systems produce by default. The API surface grew faster than the security review; the prompt store and the user data shared a perimeter.
BCG booked $3.6 billion — 25% of 2025 revenue — from AI consulting. Both firms have published software-engineering plays this year. Their core business is advice and transformation, not building and operating production software systems. Ask which production system they themselves built and operate. The silences are the procurement filter.
In the Same Week — McKinsey’s Rewiring Paper
While I was finishing this post, McKinsey published Rewiring software delivery for the agentic era — the closest thing to a public roadmap for the system this piece argues for: 24-hour sprints, multi-agent workflows over shared knowledge graphs, “two-shift digital factory”, −60% time and −60% team. Direction of travel: the same. Four lines belong next to McKinsey’s charts:
- Brownfield reality. −60% is greenfield. Reading it as a universal benchmark queues up the next pilot-purgatory wave.
- Review bottleneck. Eliminating human handoffs is right. Handoffs were also verification checkpoints. The cost moves; it does not vanish.
- Knowledge-graph build. A six-to-twelve-month engineering programme per enterprise. Naming the need is not the same as shipping the system.
- Liability column. Whose name is on the production deploy when the agent’s spec was wrong?
Twelve weeks before this paper was published, McKinsey’s own Lilli was hacked end-to-end in two hours via 22 unauthenticated endpoints. Buy the framework from someone who has shipped one — not from someone whose own internal chatbot was open to the public internet ninety days ago.
The New Bottleneck — Review
Once typing is no longer the bottleneck, review is. Hundreds of thousands of lines generated overnight must be read. Thousands of test cases need checking against the failure modes that matter. Dozens of library updates per merge need confirming as compatible and free of known CVEs. Every spec the agent produces contains a new architectural decision that must survive a steerco minute.
And review is only as honest as the disciplines beneath it: requirements engineering, architecture, coding patterns, testing, security, DevOps, observability, documentation, run. Has the reviewer kept command of all nine, or has the agent quietly taken each one in turn? Do they know what was built? Do they trust it? Would they sign for it?
Two paradoxes follow: reviewing AI output is harder than producing the original — the reviewer must understand the domain more deeply than the original author would have needed to. And the better the AI, the worse the human oversight — at 95% accuracy, people stop reading carefully. The human in the loop becomes the human asleep at the wheel.
Generation is cheap now. Review is the work. Review-throughput per senior is the right capacity metric in 2026.
The Spotify Test — What Does “Done” Mean?
In Spotify’s Q4 2025 earnings call, co-CEO Gustav Söderström said the company’s best developers have not written a single line of code since December. They use an internal system called Honk that integrates Claude Code. As reported by TechCrunch: an engineer on the morning commute opens Slack on the phone, tells Claude to fix a bug or add a feature to the iOS app; Claude pushes a new build back into Slack; engineer merges to production before reaching the office.
The headline is true and incomplete. Customers expect maximum stability — an overnight release that breaks playback on Monday costs more than the productivity gain that produced it. The senior who did not write a line of code since December is still the person who signs that the output does what the spec promised, that it did not break the playlist engine, that it does not leak data, that it is safe to ship. The keyboard moved. The accountability did not. Senior engineering work changed from production to verification — which needs more juniors in the pipeline, not fewer.
Bringing the Developers Along
The change-management cost of moving an engineering organisation from writing to reviewing is the most under-budgeted variable in the transition. Developers are passionate — they love writing code. Telling a fifteen-year senior that their job is now to read what an LLM produced and sign for it is a professional identity shift, and done poorly it triggers the dynamic that has killed every off-shoring programme of the last twenty years: quiet attrition of the best people, who join the next company before the transformation team notices. What works is participation in designing the new workflow — which gates, which guardrails, which review queue, which kill-switches. There is no version of the AI-augmented SDLC that succeeds with a hostile senior bench.
AI-Shoring — A Different Staffing Geography
For two decades the European cost-out playbook was the same: hire an army in Bangalore or Sofia, accept the cultural distance, defend the rate-card savings. The bottleneck moved. The lever moved with it.
AI-Shoring — onshore + small + senior + AI — increasingly beats offshore + large + mixed-seniority on durable delivery. Thirty engineers become ten, not zero. A realistic pattern is 10 onshore + 20 nearshore + AI rather than 100 offshore; the 10 are where the architectural decisions live. Sprints compress from three weeks to one. The customer becomes the bottleneck.
Custom Software and the M&A Window
Two macro consequences follow.
First, the appetite for custom solutions rises. A system genuinely tailored to one company’s workflow is now within budget. The signal is visible in the incumbents: Salesforce CEO Marc Benioff publicly claimed AI agents had replaced 4,000 customer-support workers; days later the company filed to permanently lay off 262 employees from its San Francisco HQ. Revenue still growing, but the posture has shifted. SaaS incumbents are using AI to compress their own cost base while simultaneously enabling the custom alternatives that compete with their core product.
Second — and this is the executive call almost nobody is making — the consolidation window is open now. The vibe-coding hype has driven a wave of “we don’t need software companies anymore” commentary. Most boards are listening. Many are quietly cancelling acquisitions of software firms on the assumption that target values are about to collapse. The reverse is true. A competent software firm — small, senior, pipeline-mature, AI-fluent — is worth more right now than it has been in a decade, because the market is mispricing it on the headline narrative. If a software boutique is on your strategy slide, the time is now, not when the market has corrected.
AI-FinOps — The Hidden Cost Variable
Every AI-augmented system carries an operating cost the line item never had ten years ago: the meter that runs every time the model is called — priced by a third party, changeable at their discretion, invisible to the steerco until the invoice arrives. Four directions can move per-request cost by a third or more without warning:
- Per-token rates can rise. Multi-model architectures are multi-billing architectures.
- The tokenizer can change. Opus 4.7’s new tokenizer consumes up to 35% more tokens for the same input. Per-request cost up by a third with no rate change.
- Usage patterns shift faster than budgets. Token-per-feature drifts upward weeks before finance notices.
- Uncontrolled use in the dev environment is the most expensive variant. A single careless
while trueagainst a frontier model can burn a five-figure invoice over a weekend — the dashboard only registers it when the invoice arrives.
For a €10M programme, €250–500k of tokens is a rounding error. For a €10,000 fixed-price engagement, €10,000 of tokens is the entire margin.
The operational answer is AI-FinOps — per-team quotas with hard ceilings, weekly tokens-per-feature publishing, model-routing owned by an architect, anomaly alerts at incident severity. Into the SDLC pipeline from the start, alongside the security and observability gates.
The Real Complexity — Everything Has to Plug Together
Every assertion of “100× faster with AI” runs into the same reality the moment it leaves the slide. An AI-augmented SDLC has to integrate with the toolchain the enterprise already operates: ServiceNow, Jira, GitHub Enterprise, CI/CD, observability, identity and access management, data classification, Betriebsrat agreements. None of it is glamorous; all of it is non-negotiable. An “agentic SDLC pipeline” is a system-integration programme, not a tooling decision. Anyone selling it as the second is selling you the slide.
Many firms now write about agentic SDLC pipelines. Many talk about them. Many pitch them. Building and operating one that is functional, secure, and reliable enough to run a regulated business on is a different story. It is a craft — years of operating production software, scars from incidents that never made it onto a slide, judgment that compounds across hundreds of deployments. Writing about a pipeline is not building one. Pitching one is not running one. The distance between a published playbook and a pipeline that survives 02:00 on a Sunday is the entire game.
Junior Engineers — Pair-Programming with the Agent
A pipeline that produces only seniors has a fifteen-year half-life. The organisations winning quietly are the ones that took AI savings and reinvested in a junior pipeline — juniors who pair-program with the agent as their daily practice, learn what the agent gets wrong, and become tomorrow’s reviewers. Universities need a mandatory new curriculum on top of the fundamentals: how to develop software with agentic AI. You cannot develop software with AI if you do not know how to develop software. The organisations that take AI savings straight to the bottom line will find, in five years, that they have nobody who can review what the agents produce — and the agent’s output will be ungoverned.
The Bottom Line
Vibe-coding is not the enemy of software engineering. It is one part of the pipeline that finally got cheap. The other parts — the ones that separate a working demo from a system that can run a regulated business — got more important, not less. A team that ships vibe-coded code into production without engineering discipline is running into a debt schedule it has not yet seen. A consulting firm that sells you a pipeline it has never operated is selling you a slide.
Compression is genuine. Liability did not move. Review is the new bottleneck. The pipeline is the product. The seniors are the moat. The window to acquire one is open now. Sprints went from three weeks to one day. The art is the review.
Engineering is not the bottleneck on the typing. It is the discipline that lets a business defend what came out the other end. The firms that internalise that will be the ones still shipping in 2030. The firms that do not will be the ones whose chatbots get hacked in two hours, and whose strategy decks no longer match the system the business is actually running.