Measuring Performance. Or How Much Performance Is Your Added Performance Costing You?

June 3, 2026

Deutsch / English

A team member caught me at the coffee bar this morning. His team had been fighting an unusual volume of bugs - enough to dig into where they were coming from. Almost all of them traced back to commits from a different team. When they looked closer at GitHub, a pattern emerged: the problematic commits were all agentic. Code written not by an engineer thinking through a problem, but by an agent executing it.

His summary was blunt: "They're probably 200% as fast as normal. And that costs us 50% of our performance on the backend."

That number is not precise. It doesn't need to be. The point it makes is real.

Organizations have always found it easier to measure output than understanding. Lines of code, features shipped, tickets resolved - these are observable. The mental model a developer carries of how a system actually works, the judgment about which abstraction will cause problems in two years, the knowledge of why a particular decision was made and what it would break to reverse - none of this is visible in any metric.

This has been a management problem since software organizations existed. It's becoming a structural risk.

As AI tools produce output that is increasingly indistinguishable from expert output, the gap between what someone ships and what they understand widens. A developer can close tickets, pass code review, and ship to production while understanding their system less each week. The artifact looks the same. The comprehension isn't there.

The organizations that manage through this period will be the ones that found ways to value and preserve understanding - through code review that asks why, through architecture discussions that expose reasoning, through hiring decisions that test judgment, not just output. The organizations that measured only what they could see will discover too late that the important layer was the one they weren't tracking.

What engineers understand is what makes your system maintainable, extensible, and recoverable when something goes wrong. You can't measure it in a sprint. You notice its absence during an incident at 2am.

Thoughts? Find me on Bluesky.