← Blog·2024-W26·24 June 2024·Partial

The prediction

None of the three leading multi-agent frameworks of mid-2024 will be the production standard by end of 2025. The winning shape will be a model-vendor-owned protocol, not a Python framework, and Anthropic is best positioned to ship it.

Verification window: by 2025-12-31 · confidence medium

Verified in

2025-W09 →

The Multi-Agent Standoff: Why AutoGen, CrewAI, and ChatDev Are All Wrong

The agentic frameworks of Q2 2024 are an industry inhaling its own exhaust. AutoGen ships a roleplay layer. CrewAI ships a roleplay layer with friendlier abstractions. ChatDev ships a roleplay layer with a software-development costume. None of the three solve the actual production problem, which is not "how does one agent talk to another." It is "how does any agent invoke any tool, anywhere, with auth, with observability, and without becoming a vendor's pet."

Our prediction: by end of 2025 the production standard will not be a Python framework. It will be a model-vendor-owned protocol. Anthropic is the best positioned to ship it.

What the frameworks get wrong

The current frameworks are roleplay-shaped because their authors come from research conferences, not deployment. They optimize for "look, the agents are arguing about the marketing plan." Real buyers do not pay for agent arguments. Real buyers pay for the boring stack underneath.

The boring stack is four pieces. A tool-invocation contract that any model can call. A capability discovery mechanism so the model knows what it can do at request time. A permission and audit layer so the enterprise can trust it. And a transport that does not require the buyer to rewrite their entire integration to switch model providers.

Frameworks live above this stack. They are not the stack.

Why a model-vendor protocol wins

Three reasons.

First, the frontier-model vendors have direct authority over how their models call tools. Whoever defines the wire format gets to define the default. OpenAI tried this with function-calling in 2023, but kept it proprietary. That was a tactical mistake because it locked enterprises into a single vendor and the market punishes that.

Second, the right protocol must be open. Enterprises will not commit their internal tools to a single vendor's plugin format. Whoever opens the spec first sets the standard.

Third, the protocol must include capability discovery, not just invocation. The framework category has not solved capability discovery at all. They expect the developer to register every tool by hand. That does not scale to enterprises with thousands of internal tools.

Why Anthropic is best positioned

Anthropic is structurally cleaner than OpenAI on this. They have no plugin store to protect. Their enterprise posture is policy-aligned with US-government and US-aligned-sovereign customers, which is the buyer class that needs the protocol most. And they have publicly committed to interpretability and safety primitives that map cleanly onto permission-and-audit at the protocol layer.

Our specific prediction: Anthropic ships a tool-and-resource protocol by end of 2024 or early 2025, in the open, with a reference server implementation. Inside twelve months the major frameworks reposition as clients of the protocol, not competitors to it.

We are calling the shape Model Capability Protocol, MCP, or something close. We will name it whatever Anthropic names it when they ship.

Where we might be wrong

We could be wrong on the vendor. OpenAI could open-source a competing spec under regulatory pressure. We weight this at thirty percent.

We could be wrong on the timing. Anthropic might prioritize Opus and Sonnet release cadence over protocol work through 2024 and push the release into mid-2025. We weight this at twenty-five percent.

We could be partially right by shape, partially wrong by adoption. The protocol ships but the frameworks fight it for eighteen months before capitulating. We weight this at twenty percent. If this is what happens we grade as partial.

What this means for the Gulf

The protocol question matters more here than in the US because of two local conditions. First, every serious GCC buyer needs Arabic capability plus tool access into legacy enterprise systems that often pre-date the cloud era. A clean protocol unlocks those integrations cheaply. A framework war drags them through pilot purgatory for years. Second, the Gulf sovereign labs are best served by an open protocol because they can build clients without negotiating commercial terms with US labs. Falcon on MCP is a credible product. Falcon on AutoGen is a science project.

For GCC operators, our advice is direct. Do not build production workflows on AutoGen, CrewAI, or ChatDev today. Build them on direct tool-invocation APIs from one frontier vendor, with an abstraction layer thin enough to swap the underlying transport when the protocol lands. The frameworks are convenient for prototyping. They will be technical debt by the end of 2025. We will grade this prediction when the protocol ships and again twelve months after.

Previous · 2024-W25

q1 2024 audit our calls graded

Next · 2024-W27

difc out issues singapore on ai tokens