AI Staggered Releases Are Hiding Cybersecurity Risks

Yesterday, Anthropic dropped something called Claude Mythos Preview, a cybersecurity AI model with restricted access for select companies. Today, OpenAI is running a pilot called Trusted Access for Cyber, handing advanced offensive and defensive hacking tools to a small group of partners it personally vouched for. Cybersecurity stocks fell. Florida's attorney general opened an investigation. And the rest of us found out about all of this by reading the news.

That last part is the problem.

Both companies are framing these staggered rollouts as responsible innovation. The logic goes: these models are too powerful for general release, so we'll give them to trusted partners first, learn from that, and protect everyone else in the meantime. OpenAI even sweetened the deal with $10 million in API credits to pilot participants. Which, sure, sounds generous until you realize that $10 million in credits also buys a lot of goodwill from the exact companies who might otherwise be loudest about demanding public accountability.

"Trusted Partners" Is Doing a Lot of Heavy Lifting Here

I am not a cybersecurity expert. I am a guy who bought a mesh router last spring and felt genuinely proud of himself. But I do not need to understand exploit chains to notice that "we decided who gets access" is not a safety policy. It is a guest list.

OpenAI says the Trusted Access for Cyber initiative aims to "enhance baseline safeguards for all users while piloting trusted access for defensive acceleration." That sentence is doing so much work it needs a union card. What it means in practice: a private company decided which other private companies are trustworthy enough to test tools that can apparently disrupt critical infrastructure, and the public gets a press release.

To be fair, the alternative, just releasing these models to everyone, would be genuinely bad. I will grant that. Offensive cybersecurity AI in the hands of anyone with a credit card is a real problem, not a hypothetical one. But the answer to "this is dangerous" cannot be "so we'll manage it ourselves, quietly." That is how you get Florida attorneys general asking whether the data is going to Beijing, which is a chaotic way to learn about a risk that should have been disclosed months ago.

The Benchmark Blackout Is Back

This is not the first time OpenAI has done this. When the o1 series launched in 2024, the company faced accusations of withholding benchmark results that would have let outside researchers assess actual capabilities. The pattern is consistent: release to insiders, manage the narrative, let the public catch up later. Staggered access is not inherently dishonest, but it becomes dishonest when there is no independent verification of what is being withheld and why.

George Ralph, Global Managing Director at RFA, noted this week that "threats are becoming increasingly sophisticated." He is right. Which is exactly why the companies building the most sophisticated threat-adjacent tools should not be the only ones evaluating whether those tools are safe.

Here is what I actually want: before any cybersecurity AI pilot launches, OpenAI and Anthropic should be required to publish an independent third-party risk assessment. Not a blog post. Not a press release with a $10 million number in it. A real document, reviewed by people who do not work for them, describing what the model can do and what happens if it goes wrong.

Congress could mandate this. The White House AI Safety Institute could require it as a condition of federal contracts. Someone with actual authority needs to step in, because right now the companies are grading their own homework and handing out the A's themselves.