... And That's When Everything Broke

Jamal Khawaja·February 13, 2026

The Agentic Problem

I need to confess something. The first AI agent Symplii.ai built worked beautifully—elegant, responsive, almost eerily competent; the kind of thing you demo to a client and watch their pupils dilate with the possibilities. I was proud of it in the way that only someone who has spent too many years in the trenches of enterprise technology can be proud of a piece of software: not because it was novel, but because it worked. I thought I had figured something out. I thought Symplii.ai was ahead of the curve. I was, as it turns out, ahead of nothing; I was standing at the edge of a cliff and mistaking the view for progress.

The second agent was for a different client with different needs, and it worked too—just differently. Different error handling, different logging, different assumptions about what data it could touch and when. By the third agent, we had three autonomous systems in production, each a bespoke creation with its own behavioral fingerprint, its own silent assumptions about how the world was organized. And that is when things started to break in ways that we could not predict, and—most damningly—could not explain to the people paying for them. The experience was humbling in the way that only technological hubris can be, and it forced me to confront a question I should have been asking from the beginning.

The question is deceptively simple: what happens when you have more than one? The industry is so consumed with making individual agents smarter—pouring resources into model capability, reasoning benchmarks, the intellectual glamour of a single autonomous system performing a complex task—that it has collectively ignored the profoundly unglamorous reality underneath. Production environments do not consist of a single agent performing a complex task. They consist of many agents, built by different people at different times under different assumptions, all operating in the same organizational nervous system without a shared understanding of how to behave. The problem with agentic AI is not capability. It is coexistence.

There is a scene in Jurassic Park—the original, not all the other crappy ones—where Jeff Goldblum's Ian Malcolm tells the park's architects that their problem is not whether they can bring dinosaurs to life, but whether they can govern what those dinosaurs do once they are alive. I think about that scene more than I would like to admit, because the agentic AI industry is building its own version of the park: marveling at the creatures it can create while doing almost nothing to ensure those creatures can coexist without devouring the infrastructure around them. The gap between creation and governance is where catastrophe lives, and right now, the entire industry is standing in that gap pretending it is a feature.

The failures are never dramatic. They are insidious. An agent calling an API with credentials scoped for a different client. Two agents processing the same event because nobody defined who owned it. An exception handler in one system swallowing errors that another system needed to see. Logs that were technically complete but operationally useless—three agents, three formats, three levels of detail, zero coherence. And behind every one of these failures, talented engineers spending days reconstructing what happened in a workflow that should have been observable from the start, rebuilding context that should have persisted, debugging behavior authored by a model that left no breadcrumb trail of intent. The human cost of architectural neglect is not measured in downtime metrics. It is measured in the hours of toil involved by the people asked to make sense of the wreckage.

The temptation is to reach for the tools that already exist—orchestration frameworks, observability platforms, API gateways—and declare the problem solved. But agentic systems introduce a category of complexity that these tools were never designed to address. An orchestration framework built for microservices does not know what to do with an entity that reasons, adapts, and makes decisions outside of the logic its developers wrote. An observability platform can tell you what happened; it cannot tell you why an agent decided to do what it did, or whether that decision was consistent with what it was supposed to do, or whether the permissions it invoked were the permissions it should have had. We are fitting nineteenth-century harnesses onto twenty-first-century animals and wondering why they keep bolting through the fence.

I do not have all of the answers yet. That admission costs me something, because the instinct to present certainty—to position myself and Symplii.ai as the team that has already solved what others are still struggling with—is an instinct I have spent decades cultivating and only recently begun to distrust. What I do have are the questions, and I have become increasingly convinced that the questions themselves are more valuable than any premature architecture diagram. Over the next several months, I, along with my talented team at Symplii.ai, intend to pull this thread: to name the structural problems that emerge when agentic AI moves from demo to production, to examine why the current answers fall short, and to explore what it would actually take to build something that works at scale without requiring an act of faith every time you deploy a new agent.

This is the first installment of that exploration. If you have been building agents and watching them misbehave in ways that feel familiar, I suspect you already know what I am talking about. If you have not yet experienced it, give it time. The second agent is where the education begins.

Jamal, Founder — Symplii.ai