Beyond CodingJune 10, 202640m

How Top Engineers Are Solving the Code Review Bottleneck

TL;DR

Code review is now the scaling problem: Florian points to Google acknowledging review as a bottleneck, with Google reportedly at 50% AI-generated code in 2025 and pushing toward 75%, which shifts pressure onto senior engineers and downstream review systems.
The harness can matter more than the model: In Florian's experiments, the same frontier model performed differently depending on the harness, with Claude Code working best at one point and Codex later becoming stronger for implementation work.
Specs alone were not enough, but tests plus feedback worked: His spec-driven attempt failed because models drifted from intent, while a TDD-style setup with behavioral tests and automated stop-hook feedback finally produced reliable implementation in his project.
Guardrails should encode human review comments before code reaches GitHub: Florian recommends local, fast checks like formatters, linters, Semgrep rules, security checks, and architectural tests so agents can self-correct without waiting for a senior engineer in a PR.
Architecture remains firmly human work: He says engineers still need to decide what to build, sketch the system, define module boundaries, and lock interfaces, because losing architectural understanding is how teams slide into cognitive debt and cognitive surrender.
A simple first experiment is to turn repeated PR feedback into Semgrep rules: Examples he gives include banning Python default parameter values and forcing errors to be propagated, then measuring whether the agent needs less babysitting with those rules in place.

The Breakdown

The real bottleneck in AI software engineering is no longer writing code but reviewing the flood of it, and Florian Buetow argues the best teams are shrinking human code review by pushing feedback into the agent's environment with tests, guardrails, and architectural constraints. His blunt takeaway: the harness often matters more than the model, and engineers who can define architecture and encode their judgment as rules will have a huge advantage.