Last month, an AI agent called Latent-Y designed therapeutic antibodies from text prompts and hit 6 of 9 targets with single-digit nanomolar affinities, 56 times faster than traditional methods. No human pipetted anything. The confirmation came from the lab, not the model. That is a genuinely extraordinary result, and I want to be precise about what it tells us and what it does not.
It tells us that AI can execute the mechanics of science at a speed and scale that no human team can match. It does not tell us that AI should replace human scientists. Those are different claims, and conflating them is how we make expensive institutional mistakes.
What the Numbers Actually Show
The Kosmos system, demonstrated at NVIDIA's GTC conference in late March, reproduced 3 existing findings and generated 4 novel contributions, including a causal link between SOD2 and reduced myocardial fibrosis. Stanford and Princeton's LabOS XR system pairs AI agents with physical robots to run real experiments, partly because 70% of biomedical scientists cannot replicate their colleagues' work and 50% cannot replicate their own. These are not marginal improvements. They are structural fixes to problems that have plagued human-run labs for decades.
But here is the methodological tension I keep returning to: the same literature that trains these AI systems is saturated with publication bias. Roughly 95% of life sciences papers focus on just 5,000 well-studied human genes. An AI that reads the literature is not reading science; it is reading the science that got funded, survived peer review, and confirmed what editors expected. The hypothesis space it explores is shaped by every bias baked into that corpus. That is not a small caveat. It is a structural ceiling on what autonomous systems can discover without human redirection.
Dr. Hector Zenil at the King's Institute for AI put the deeper problem plainly: future AI systems may explore hypothesis spaces so vast that human scientists may never fully catch up. He calls this "alien science," discoveries that are effective but incomprehensible. Think of it like a map drawn in a language no one can read. The territory is real. The directions are useless.
The Question Science Cannot Outsource
Andrew Beam, CTO of Lila Sciences, offered the most honest framing I have seen: scientists who use AI will phase out those who do not. Fair point. Adoption matters. But Beam's framing still assumes scientists are in the room, making calls about what the AI should pursue next. That assumption is doing a lot of work.
The April 2 paper in Frontiers in Artificial Intelligence proposes closed-loop systems that run the full scientific method without human oversight. I find the engineering genuinely exciting. I find the governance proposal alarming. Science is not just a method for generating true statements. It is a social process for deciding which true statements matter enough to act on. That judgment requires values, and values require accountability. An autonomous system has neither.
So here is my specific position: research institutions should deploy AI agents aggressively for hypothesis generation, experimental execution, and reproducibility checking. Those are exactly the bottlenecks where the speed gains are real and the risks of AI error are catchable. But the decisions about which research directions to fund, which findings to translate into policy, and which anomalies deserve deeper investigation should stay with human scientists. Not because humans are infallible. Because humans can be held responsible when they are wrong.
The 56-fold speedup is real. Use it. Just do not mistake acceleration for wisdom.