The best token count is no token count.

Jun 25

Earlier in my career I was on a team at Walmart building internal software to balance inventory across the distribution network. The leadership had been hired in from Home Depot to build the same system they'd built there, except it became clear pretty quickly that they didn't know how they'd built it. So the roadmap was spray-and-pray. A new set of top priorities almost every week, all of them urgent. I distinctly recall my manager telling me at one point, with complete sincerity, "Yes, everything is a top priority, but this one is a burning top priority."

You can guess who got rewarded on that team. Not the people doing careful work, and not the ones trying to hold some strategic line week to week. The people who got praised shipped the highest volume of stuff, and it didn't matter that most of it didn't survive to the next week's reprioritization. People pulled all-nighters in the first quarter of a five-year build and got promoted for it.

That wasn't bad luck or bad people. It's what happens when leaders can't tell good work from bad. If you don't know what good looks like, you can't reward good outcomes, because you can't recognize one. So you reward what you can easily see, which is volume. Counting how much got done is what management looks like once you've lost the ability to judge whether any of it mattered.

The entire tech industry does a version of this every few years. It falls in love with a number, and almost every time it's a number that counts effort instead of results. Lines of code. Story points. Commits per week. Everyone agrees to chase it, and a year later the team is very good at moving the number, often without any measurable improvement to the underlying product. The productivity metric measures how hard everyone worked. It has nothing to say about whether the work was worth doing.

The newest number is tokens.

Let’s be clear: code is not the thing you're making. Code is the cost of the thing you're making. The value is what the code does for somebody. The code itself is what you have to maintain, debug, secure, and eventually delete when the abstraction turns out wrong. Jeff Atwood wrote it down in 2007: "the best code is no code." Get the result without adding to the pile of things that can break.

And a token isn't even code. It's an abstraction for code, the raw material the model burns through to produce some. Lines of code were already a bad proxy, counting what you make instead of what it does. Tokens take that bad proxy and step one further back, counting the raw material consumed to produce the thing that was already the wrong thing to count. A measurement of a measurement, each layer further from anything a customer would pay for. Counting lines of code at least booked a liability you could see; counting tokens books the cost of producing that liability and calls it progress.

The whole reason anyone got excited about AI coding was the ratio. A few dollars of compute doing work that used to cost weeks of salary. That was the leverage, until people started competing over who could spend the most, which destroys the only thing that made the tool interesting. If you're proud you spend hundreds of thousands a month on Claude, you've forgotten the appeal was that it was supposed to be cheap. It's like coming home from the store, holding up the receipt, and bragging about the total before anyone's asked whether you ate well or threw half of it out. The receipt is not the meal.

I'm building a product with these tools right now. A while back I went looking for why my costs were creeping up and found the system had been firing off paid API calls against placeholder records that should never have triggered anything, plus a whole class of duplicate rows because two accounts were syncing the same source and nobody had told them not to. None of it was written by a person who'd have to answer for it, meaning me. It was generated, it ran, and it sat there costing money until I thought to look. After enough of that you stop reading what the agent produces, because reading it is slower than re-rolling the dice, and now you've got files nobody has read, including you. The codebase stops being something you built and becomes something that accumulated. You're not writing software. You're accreting it.

This isn't only my experience. The largest analysis I’ve found looked at 211 million lines of code: once volume became the point, the share of work spent cleaning up and simplifying old code fell by more than half, while copy-pasted code climbed year after year.

To be fair, the lazy version of this argument just dunks on people for moving fast, and that version is wrong. Sometimes the firehose is right. If you're a startup before product-market fit, most of your code is going to die anyway. You'll learn something from a customer in three weeks that makes you delete half of what you built. Careful craft on code that won't survive the next pivot is its own kind of waste. Burning a pile of tokens to test five ideas this week instead of one is the cheapest market research you'll ever buy. Speed to learning is the only thing that matters before you've found the thing that works.

What changes the calculation is blast radius. Disposable code is fine for a landing page. It is not fine for the service that handles your logins, or the ledger that moves money, or the table that stores medical records. There, a subtle bug isn't a bad afternoon. It's a breach, a fine, or someone getting hurt.

The most common real situation is the overlap. Even a three-week-old startup has an auth flow and a payments table, and those are exactly the parts you can't vibe your way through, no matter how early you are. The rule isn't "early means fast everywhere." It's fast on the throwaway parts, careful where being wrong is expensive, and that line runs through even the youngest codebase.

Good engineers already work this way. They let the agent run wild on the disposable parts and tighten up where the cost of being wrong is high. And they lean on the one thing that lets you run code you didn't read line by line: tests and monitoring that fence in what the system can do. You don't have to understand every line if you've constrained the behavior and you'll hear about it the moment it steps out of bounds. People argue as if the only options were reading every line yourself or trusting the machine blind. There's a third way, and serious teams use it.

The same pattern shows up beyond code. Teams rush to put AI into the product before anyone checks whether it's the right tool, or the cheapest tool, for the problem in front of them. The AI is the input they're proud of. Whether it solved anything is a separate question nobody stopped to ask.

You don't have to take my word on the gap between feeling fast and being fast. A research outfit called METR ran a careful test: sixteen experienced developers, in codebases they knew, doing real tasks from their own backlogs, some randomly allowed AI and some not. It's one study and a small sample, so hold it loosely. Going in, the developers expected AI to make them about a quarter faster. Afterward, they still believed it had. The stopwatch said they were about 19 percent slower. They were slower, they felt faster, and they couldn't tell the difference.

Notice where that showed up: experienced people in mature code they knew well. The high-blast-radius case, not the throwaway prototype. The firehose can win on greenfield, where there's nothing to break and the point is to learn fast. The trouble starts when you carry the prototype's habits into the code where being wrong is expensive, and the tool keeps telling you you're flying.

So the question isn't whether the tools are good. I use them every day. They're good the way any sharp tool is good, when you know what you want and you're paying attention.

The question the industry forgets to ask, every time it falls for a new number, is "more of what?" More features, if they're the wrong ones, is more surface for users to wade through and engineers to maintain. More code nobody reads is more places for a bug to hide. More tokens is a bigger bill. Each one counts the input and ignores the only thing that pays: the outcome.

The goal was never to stop counting inputs. Inputs aren't the enemy. An input untethered from any outcome is the enemy. The number that matters is the ratio: how much customer problem you solved for each dollar, each line, each token you spent getting there. Nobody puts that on a leaderboard, because it's the hard one. It asks you to have a view about what good looks like and to be accountable when you're wrong.

The best token count is no token count. Not because counting is evil, but because the count is a way of dodging the only question worth asking: was the thing you built worth building. The brag worth making was never how much you spent. It's "I did a hard thing, and you'd be surprised how little it took."

Brian Root

Brian Root is a seasoned product management executive with a rich history at the helm of digital transformation in tech giants like Amazon and Walmart Labs. As the founder of Rooted in Product, he brings his expertise to early-stage startups and Fortune 100 companies alike, specializing in transforming product visions into reality through strategic leadership and system optimization.

https://www.rootedinproduct.com/brian-root-author-bio

The best token count is no token count.

A Metamorphosis

© 2024 Rooted in Product, LLC. All rights reserved.

© 2024 Rooted in Product, LLC.
All rights reserved.