Disney had one employee invoke Claude 460,000 times over 9 mid-April workdays. That's 51,000 requests per day. The internal dashboard tracking this was framed as a tool for "efficient and effective" resource use. What it actually became was a leaderboard, and engineers started competing on it. If that sounds familiar, it's because we ran this exact experiment with lines of code in the 1970s and spent the next 50 years explaining why it was a bad idea.

Token consumption as a productivity metric has the same structural flaw: it measures input, not output. Meta's 85,000 employees burned through 60 trillion tokens in 30 days via its internal Claudeonomics leaderboard, handing out titles like "Token Legend" before shutting the whole thing down in early April. Visa is at 1.9 trillion tokens per month and giving prizes to power users. Jensen Huang told the All-In Podcast at GTC 2026 that he'd be "deeply alarmed" if a $500,000 engineer wasn't consuming at least $250,000 in tokens per year. These are not productivity benchmarks. They are consumption benchmarks, and conflating the two is the kind of mistake that looks obvious in retrospect.

When the Number Actually Means Something

Here's where I have to be honest about the tension in my own argument. Token volume is not always noise. A hedge fund analyst running $1,000 per day in tokens and generating a documented 5x productivity gain with greater than 200% ROI is a real signal. That's a controlled environment with measurable output, a clear cost, and a verifiable return. The token count there is a proxy for work done because someone actually checked.

Salesforce made this distinction explicit in April 2026, announcing a new productivity metric and explicitly rejecting "tokenmaxxing" as a measure of real work. That's the right call. The question isn't whether tokens correlate with productivity in some cases. They do. The question is whether raw consumption numbers, reported at the trillion-token scale across entire enterprises, tell you anything actionable. They don't, unless you also know what the tokens produced.

The Disney case is instructive precisely because the high-volume user's requests were coming from autonomous agents, not a human typing prompts. One staffer noted that Disney would just increase quota if they hit a cap. So the number reflects agent activity, not human effort, and the two are not the same thing. An agent spinning in a retry loop because of a bad system prompt will rack up tokens just as fast as one doing useful work. Ask me how I know. I've debugged enough runaway LangChain pipelines to have opinions about this.

What Should Replace It

The metric that matters is output per dollar of inference cost, measured against a baseline. For a coding assistant: pull requests merged, test coverage delta, time-to-close on issues. For an analyst workflow: decisions made per hour, accuracy on verifiable predictions. These are harder to instrument than a token counter, which is exactly why companies reach for the easy number instead.

Builders should push back when their organizations start treating token budgets as performance reviews. If your manager is excited that your team consumed 10 billion tokens last sprint, ask what shipped. If the answer is vague, the metric is doing harm. Token counts belong in your cost monitoring dashboard, next to your AWS bill, not in your performance review.

We already know how this story ends when you optimize for the wrong proxy. The codebase gets longer. The bill gets bigger. The product doesn't get better.