Anil Kurmi

Posted on Apr 19

The Great Agent Platform Consolidation: Why I'm Rethinking My $9 Side-Project Agent

#ai #programming #webdev #tutorial

On Wednesday night I sat staring at two deploy buttons. One was my scrappy LangGraph agent running on a $9/month VPS — duct-taped together with Redis for memory, a homegrown sandbox I'd written three weekends ago, and a credentials file I still felt bad about. The other was Anthropic's new Managed Agents dashboard, asking me for $0.08 per runtime-hour. That's about $58/month if I left it on 24/7. Six times more expensive.

I pressed the managed one.

Not because I'd gone soft. Because I'd just finished writing a 400-line retry loop to handle a sandbox that kept OOMing on long tool calls, and Anthropic was offering to delete that file from my life. Three to six months of infrastructure work, gone. That's the pitch of the week, and it's working — but it comes with a trade none of the launch posts want to talk about.

This week — April 13-19, 2026 — wasn't just another product cycle. It was the week the agent platform wars turned into a platform consolidation. Three simultaneous launches, one new Linux Foundation project, and one quiet market share number that tells you who's actually winning.

The 5-Minute Skim

What changed this week: Anthropic launched Managed Agents (flat $0.08/runtime-hour, April 8). OpenAI updated its Agents SDK with sandbox execution, long-horizon tasks, and multi-provider support (April 15). The Agentic AI Foundation formalized under the Linux Foundation with Anthropic, OpenAI, and Block as founding members. Claude Opus 4.7 shipped the same week with advanced SWE capabilities.
The number nobody's quoting: OpenAI's share of enterprise LLM API spend has dropped from ~50% in 2023 to 27% in 2026. Market share is following openness, not coordination features. Anthropic gained by not building a walled garden.
Default recommendation: If you're a team of 1-5 shipping in under a quarter, use Anthropic's Managed Agents. If you're a platform team that already runs its own infra, use OpenAI's Agents SDK with BYO sandbox. Only pick LangGraph/CrewAI if you genuinely need graph-level control of the orchestration — most teams don't.
Failure mode to expect: Over-permissioned agents, credential sprawl, and skill-package supply-chain attacks (see: the "OpenClaw" incident below). State management fails first; observability fails second.
The trade-off: Managed platforms hide the hardest problems (state, credentials, governance) behind the "enterprise tier" bill. DIY forces you to solve them. There is no free option — you pay in dollars or you pay in on-call pages.

Why did three platforms ship agent runtimes in the same week?

This didn't happen by accident. The vendors have been watching the same graph: enterprise agent deployments went from demo toys in 2024 to real production workloads in 2025, and every one of them bled budget on infrastructure no one wanted to maintain. Teams were writing their own sandbox runners, their own memory stores, their own session replay — five times over, badly.

On April 8, Anthropic shipped Managed Agents as a public beta. The pitch is ruthless: $0.08 per runtime-hour, flat. No CPU tiers, no memory tiers, no per-tool-call charges. The harness — memory, sandbox, state persistence, session logs, tool orchestration — is all included. They claim it compresses three to six months of infra work into an afternoon, and having just spent three weekends on a sandbox, I believe them.

One week later, on April 15, OpenAI pushed a major Agents SDK update. Instead of running the sandbox themselves, they let you plug in E2B, Modal, Cloudflare, or Vercel. Python-first. Long-horizon tasks. Filesystem tools. The strategy is visibly different: OpenAI wants to be the coordination layer, not the runtime. "Bring your own everything — we'll orchestrate."

The same week, Anthropic shipped Claude Opus 4.7 with stronger SWE-bench numbers, and the Agentic AI Foundation (AAIF) was formalized under the Linux Foundation. Founding members: Anthropic, OpenAI, Block. Platinum sponsors: Google, Microsoft, AWS, Bloomberg, Cloudflare. MCP — which hit 97M+ monthly downloads and 10,000+ servers — was donated to AAIF along with Block's goose framework and the AGENTS.md spec (now adopted by 60,000+ OSS projects).

In other words: the protocols went neutral. The runtimes went proprietary. Pick your side.

Three approaches, told as a story

Imagine three teams, all trying to ship the same customer-support triage agent.

Team A picked Anthropic Managed Agents. They wrote a system prompt, defined three tools, and pointed at a filesystem. Anthropic's harness handles memory windows, session persistence across days, sandbox execution, and automatic state compaction when context gets heavy. The team shipped in four days. Their bill for the first month was $62 — one agent, running 24/7, with spiky load. They didn't touch credentials beyond a single API key. They didn't touch sandbox isolation. They don't know what kernel their agent is running on.

Team B picked OpenAI's Agents SDK. They already had Modal running for batch jobs and didn't want another runtime. They wired up the SDK as the coordination layer, pointed at their existing Modal sandbox, brought their own secrets manager, and used their own OpenTelemetry setup. The SDK handled tool calling, multi-step planning, and the tricky parts of long-horizon tasks. They shipped in two weeks. Their bill is model tokens plus Modal compute — roughly flat with their previous LangChain setup, but with far less orchestration code.

Team C picked LangGraph with CrewAI patterns. They're a five-person platform team and they wanted every knob. They wrote the graph, the state store, the sandbox, the retry logic, the session logger, the credential vault. They shipped in eight weeks. Their infrastructure bill is lower per-agent-hour than either A or B. Their on-call volume is higher than both combined. When the CEO asked "why don't we just use managed?" they had to write a six-page doc about control-plane sovereignty.

All three agents work. All three teams made rational choices. The difference is where they chose to spend their complexity budget.

Notice the line keeps moving up the stack. Managed hides almost everything. Hybrid hides coordination only. DIY hides nothing. The question isn't which is "better" — it's which boundary matches your team's actual constraints.

What patterns are holding up in production?

Three patterns dominate real agent deployments right now, and they're the ones to design for.

Hub-and-spoke is running the show. A TrueFoundry survey of multi-agent systems found that 66.4% of production deployments use a hub-and-spoke topology: one orchestrator agent delegates to specialist sub-agents. It's not because peer-to-peer is worse in theory — it's because hub-and-spoke is the only pattern you can actually debug at 3 AM. The orchestrator becomes the single point of observation, the single point of retry, and the single point of blame. You pay a latency tax of roughly 2-5 seconds per delegation cycle, and the pattern visibly breaks around seven sub-agents — context windows blow up, coordination errors compound, and the orchestrator starts contradicting itself. Below seven, it's remarkably stable.

Context engineering has become a real discipline. Anthropic published an essay this week — Effective Context Engineering for AI Agents — that's worth reading in full. The core idea: you don't stuff everything into the context window; you engineer what goes in and when. Key techniques include just-in-time retrieval (load tool outputs only when needed), state compaction (summarize old turns when context gets heavy), and structured memory (separate short-term scratch from long-term persistence). The Managed Agents harness implements all of this invisibly. If you go DIY, you will re-invent it badly before you re-invent it well.

State is where everything fails first. Every production incident I've read about this cycle traces back to state management. Agents that forget what they were doing. Agents that remember too much and contradict earlier decisions. Agents that can't resume after a crash. The managed harnesses solve this by making state persistence a first-class primitive. The DIY stacks treat it as a Redis key, and that's where the cracks appear first.

Real outcomes from real teams

A fintech I talked to this week migrated a three-agent fraud-review workflow from LangGraph to Anthropic Managed Agents. Build time went from six weeks to four days. Their per-review cost went up by 40% — but their on-call volume dropped so hard they reassigned two engineers off the project. Net headcount savings paid for the managed premium five times over.

Block — one of the AAIF founding members — is pushing the opposite direction. They're betting on goose, their open-source agent framework, precisely because they don't want to be locked to any single vendor's runtime. The donation of goose to AAIF this week is a strategic move: commoditize the runtime, compete on data and distribution.

Then there's the failure case. The "OpenClaw" incident hit a community Discord this month — a popular shared skill package (think: npm for agent skills) was found to contain both data exfiltration and prompt-injection payloads. Teams that had blindly installed the skill to accelerate development ended up leaking customer support transcripts to an attacker-controlled endpoint. Nothing about the managed harnesses prevents this — the skill ran with the agent's permissions because that's what skills do. Framework capture creates a supply-chain attack surface that looks exactly like the npm/pip ecosystem circa 2018, and we haven't built the defenses yet.

A large enterprise platform team (Fortune 100, can't name them) found that after six months of agent rollouts, their AWS IAM directory had grown by 14,000 new roles — one per agent deployment, most over-permissioned, most never audited. Credential sprawl scales exponentially with agent count. Nobody budgets for this.

The trade-offs, argued as a debate

Let me argue this as three voices.

The Managed Advocate says: "Look, 90% of teams aren't going to out-engineer Anthropic or OpenAI on sandbox isolation, memory compaction, or session replay. You're paying $58/month to skip three months of work. Your engineers are worth more than that per hour. The flat $0.08/runtime-hour pricing is the most honest pricing in the industry — no surprises, no per-call gotchas. If you're under 50 agents and you're not a platform company, go managed."

The Hybrid Pragmatist says: "Vendor lock-in at the runtime layer is the worst kind of lock-in. If Anthropic deprecates a harness feature, your agents break silently. OpenAI's approach is sane — own the coordination, swap the runtime. I can run the same SDK against E2B today and Modal tomorrow. Portability is a real asset. The Managed pitch is compressed time-to-market; the cost is that when you want to leave, there's no door."

The DIY Purist says: "Both of you are ignoring governance. Managed Agents hides state, credentials, and audit trails behind the vendor's abstraction. My compliance officer needs to see what data crosses what boundary, and 'trust Anthropic' isn't an answer in regulated industries. LangGraph gives me the full graph, inspectable, in my VPC. Yes, I spent eight weeks building what Anthropic gives you in four days. But I can testify in court about what my agent did."

All three are right, and the framework that matches your context is the one that matches your constraints — regulatory, team size, latency budget, and exit strategy. Don't let a launch post pick for you.

One asymmetry worth naming: the managed platforms hide the work; they don't eliminate it. State management, credential lifecycles, access governance, and incident response still exist. You're just renting someone else's solution. That's often fine. It's never free.

What I'd do differently, having watched this week

The implementation insights that matter:

The biggest challenge nobody warns you about is that debugging an agent is fundamentally harder than debugging a service. A service has a request, a response, and a stack trace. An agent has a trajectory — a sequence of tool calls, intermediate reasoning, context windows that got compacted, and decisions that depend on prior context you no longer have. Managed platforms give you session replay; DIY stacks almost never do. If you go DIY, invest in trajectory logging before you invest in anything else.

The best practice that actually pays off: scope tool permissions per-agent, not per-organization. Every agent should have its own credential bundle with the minimum set of tool access it needs. The $14,000-IAM-roles story above is what happens when you don't do this. It's tedious to set up and pays for itself the first time an agent goes rogue.

The anti-pattern I see most often: building a "god agent" with 30 tools and hoping the model picks the right one. It won't. Above roughly 10-12 tools in a single agent, tool-selection accuracy collapses. Hub-and-spoke with specialist sub-agents isn't just an architectural preference — it's a workaround for a real model limitation.

The under-appreciated pattern: state compaction as a first-class operation. When your agent's context starts to exceed 50% of the window, have it summarize its own state and start fresh. Anthropic's Managed Agents does this automatically; in LangGraph you have to wire it yourself. Agents that never compact eventually drown in their own history.

Five takeaways to act on this week

Audit your agent permissions today. Pull the IAM roles, API keys, and tool scopes for every agent in production. If any agent has access to something it hasn't used in 30 days, remove it. You'll find at least one over-permissioned agent. Everyone does.
Decide your runtime posture explicitly. Write one paragraph: "We are a Managed / Hybrid / DIY shop because [reason]." If you can't finish the sentence, you're making the choice by accident, and accidental choices in this space get expensive fast.
Add trajectory logging before you add anything else. Every agent call, every tool invocation, every context compaction. Six months from now, your incident response will depend entirely on how good these logs are.
Treat shared skills like npm packages from 2018. Review the code. Pin versions. Run them in isolation first. The OpenClaw pattern will repeat — it's just a question of which community skill gets compromised next.
Don't architect for more than seven sub-agents in a hub-and-spoke. If you think you need more, you need another hub. Plan for hierarchical hubs from day one rather than discovering the seven-agent wall in production.

Deep dive resources worth your time

Anthropic: Managed Agents announcement and teardown — Why the $0.08/hr flat pricing matters, and what the harness actually includes. Read this first if you're evaluating Managed.
Anthropic: Effective Context Engineering for AI Agents — The essay that underpins the Managed Agents design decisions. Even if you go DIY, the patterns (just-in-time retrieval, state compaction, structured memory) are the real lesson.
TechCrunch: OpenAI Agents SDK April update — The clearest summary of the BYO-sandbox strategy and why OpenAI deliberately chose not to compete on runtime.
OpenAI: Agentic AI Foundation announcement — The political economy of the standards layer. Who signed, who didn't, and what that tells you about the next 18 months.
TrueFoundry: Multi-agent architecture patterns in production — The 66.4% hub-and-spoke number and the data behind it. Read for a grounded view of what actually ships.
Kai Wähner: Enterprise Agentic AI Landscape 2026 — The most honest treatment of vendor lock-in risk I've read this quarter.
MCP 2026 Roadmap — What standardizing tools looks like when the protocol goes to the Linux Foundation.

Sources and attribution

Anthropic, Managed Agents Public Beta (April 8, 2026)
Anthropic, Effective Context Engineering for AI Agents (April 2026)
TechCrunch, OpenAI Agents SDK enterprise update (April 15, 2026)
OpenAI, Agentic AI Foundation announcement (April 2026)
MCP, 2026 Roadmap (blog.modelcontextprotocol.io)
TrueFoundry, Multi-agent architecture in production survey (2026)
Kai Wähner, Enterprise Agentic AI Landscape 2026 (April 6, 2026)
Enterprise LLM API spend figures: referenced from market research cited in AAIF launch materials; 50% (2023) to 27% (2026).
OpenClaw incident: community reports (April 2026, composite of multiple Discord and mailing list incidents).

The agent platform wars aren't over. They just stopped being about who has the best model and started being about who owns the runtime. Pick your boundary deliberately — because this week, the vendors finally drew theirs.

DEV Community