Why Killing AI Pilots for Portfolio Discipline Is Premature

Last month I helped a mid-size logistics company debug an agentic workflow built on LangChain and Azure OpenAI. The agent was supposed to route freight quotes through 3 internal APIs. In staging, it worked. In production, it hallucinated a carrier ID, sent a real booking request to a real trucking company, and committed the firm to a $14,000 shipment nobody ordered. The team caught it in 40 minutes. If they'd skipped the pilot and scaled this across their 11 regional offices, the number would have been six figures before lunch.

That's the case for pilots in 2026, and I think the current push toward portfolio discipline as a replacement for experimentation gets the causality backwards. The 95% pilot failure rate from MIT is real and ugly. But the lesson isn't "stop piloting." The lesson is that most organizations run pilots without exit criteria, without production gates, and without anyone responsible for deciding go or kill. That's a process failure, not a reason to skip the process.

You Can't Govern What You Haven't Broken

The KPMG numbers tell a story that portfolio discipline advocates keep glossing over. 80% of leaders now flag cybersecurity as their top AI barrier, up from 68% a year ago. 65% cite agentic system complexity. These aren't theoretical risks you can model in a spreadsheet and then fund your way past. They're failure modes you discover when a LangChain agent calls a tool it shouldn't, when a retrieval-augmented generation pipeline leaks PII through a poorly scoped vector store, when an autonomous agent loops on a malformed API response and racks up $9,000 in compute before your monitoring catches it.

I've seen all 3 of those in the last 6 months. Each one taught the team something no architecture review would have surfaced.

Arthur AI flagged this in their 2026 governance report: traditional AI governance frameworks don't address agentic systems that autonomously reason, plan, and act across tools. You can't write governance policy for behavior you've never observed. Pilots are where you observe it. Kill the pilots, and your governance framework is fiction.

Portfolio Discipline Without Pilot Data Is Just Budgeting

I'll grant the portfolio discipline crowd one thing: running 31 simultaneous pilots with no shared learnings and no production pathway is waste. Deloitte found nearly 50% of enterprises did exactly that in 2025. That's bad engineering management, and it deserves criticism.

But the fix being proposed, picking 3 bets and funding them to scale, assumes you already know which 3 bets will work. The 22% of organizations Microsoft calls Frontier Firms didn't get there by skipping experimentation. They got there by running pilots with clear success metrics and then promoting winners into production across an average of 7 business functions. The pilot wasn't the problem. The lack of a promotion pipeline was.

Here's what I'd tell any engineering lead staring at a portfolio discipline mandate from the C-suite: demand that every pilot has a 90-day gate with 3 possible outcomes. Promote to production. Pivot the approach. Kill it. No renewals, no extensions, no "we need more data." That's not portfolio discipline versus pilots. That's pilots done like an engineer instead of a committee.

The 59% of leaders expecting measurable ROI within 12 months won't get it from a top-down portfolio plan any faster than from scattered experiments. They'll get it from teams that ship a pilot, break it, learn the failure mode, fix it, and ship again. That cycle takes weeks, not quarters. And it produces the kind of institutional knowledge, what breaks, where the data gaps are, which APIs can't handle agent traffic, that no strategy deck can substitute for.

The logistics company I mentioned? They fixed the hallucinated carrier ID by adding a confirmation step and a lookup constraint. Took 2 days. They're now rolling the agent out to 4 offices. If they'd waited for a portfolio review to greenlight a full-scale deployment, they'd still be in a planning meeting, and they'd have no idea that carrier ID hallucination was even a risk.

Scale what you've broken. Not what you've budgeted.