Do AI Chatbots Slow Down at Peak Hours? Wrong Question

At 2 p.m. on a Tuesday, when half the country's office workers are simultaneously asking ChatGPT to summarize a document or explain a diagnosis, the response comes back in under 3 seconds. OpenAI's infrastructure team deserves credit for that. The latency problem, the one users complained about in 2023, is largely solved. So when people ask whether AI chatbots get slower when everyone uses them at once, the honest answer is: not in any way you'd notice.

That answer, though, is doing a lot of work to obscure a different question.

Speed Was Never the Bottleneck Worth Watching

The infrastructure investment that eliminated visible lag also eliminated the most legible signal that something might be wrong. When a system is slow, users wait, grow skeptical, and sometimes abandon it. When a system is fast and wrong, users trust it. OpenAI, Google, and Meta have spent enormous sums ensuring their models respond quickly, and the business logic is obvious: friction reduces engagement, and engagement is the product. The incentive to be fast and the incentive to be accurate are not the same incentive.

Nick Tiller's study, published last week through the Lundquist Institute, tested 5 chatbots across health topics including cancer, vaccines, and nutrition. Nearly 50% of responses were problematic; close to 20% were rated highly problematic with potential to cause harm. A separate study of 21 AI models on clinical vignettes found failure rates exceeding 80% for differential diagnosis tasks. These numbers do not describe a system straining under load. They describe a system performing at its designed capacity and still getting it wrong half the time.

Tiller's assessment of Grok is worth sitting with: most likely to produce highly problematic responses, possibly because it trained on X, which he called a "cesspit of misinformation." That is not a server problem. That is a data sourcing decision made by a company whose owner also controls the data source. The conflict of interest is structural, not incidental.

The Congestion That Actually Matters

Here is the tension I'll grant the optimists: the accuracy problems in these studies predate mass adoption, which means peak load may not be making things worse. The baseline was already this unreliable. That's a fair point. It does not, however, make the situation better.

What mass adoption does is scale the consequences. When 10 million people ask a chatbot about a cancer symptom on the same afternoon, the 49.6% error rate is not an abstract statistic. It is roughly 5 million people receiving guidance that a researcher with a PhD called potentially harmful. The system isn't slower. It's just wrong at industrial scale, and the companies building it have no regulatory obligation to tell you that.

Sen. Josh Hawley has called for prosecuting AI harms the way we'd prosecute a human who caused them. The framing is politically convenient but analytically useful: if a doctor gave harmful advice to half their patients, we would not accept "the waiting room was full" as an explanation. We would ask what the doctor knew, when they knew it, and what they chose to do about it.

OpenAI, Google, and Meta know their error rates. They publish some of them selectively, in blog posts that emphasize improvement percentages rather than absolute failure counts. The question of whether your chatbot slows down at peak hours is, at this point, a distraction. The question worth asking is what these companies are required to disclose about accuracy, to whom, and under what conditions. Right now, the answer is: almost nothing, to almost no one, almost never.