The Actual Stack for Building 24/7 AI Agents Right Now
Open CLI for browser automation, Claude Managed Agents for persistent sandboxes, and Ultrareview for code quality. This is the stack that's actually working in production.
In the last article I talked about Manus Cloud Computer and why the hype around it is overblown. I promised I'd follow up with what I'd actually build on if I were setting up a 24/7 agent system for a client today. This is that article.
Three things have shipped in the last few weeks that, put together, give you a production-grade agent stack without depending on a single vendor's credit system or hoping their servers aren't busy. They're not competing with each other. They're layers.
Open CLI: browser automation that doesn't eat your context
If you've ever tried to build an agent that does repeated browser tasks, you know the problem with Playwright. Every step sends a full screenshot to your LLM. That's a massive chunk of context consumed on something the agent has already done a hundred times before. You're paying for the agent to look at a login page it's already memorized.
Open CLI takes a different approach. It turns websites into APIs that an agent can call programmatically. Instead of your agent navigating a page visually, it calls a custom tool that does the exact workflow directly. No screenshots. No extra JSON. No context bloat.
It already supports over 90 commonly used websites natively. Most social media platforms, research tools, lead scraping sites. For anything it doesn't support out of the box, you build a custom CLI that wraps the specific workflow your agent needs.
The practical difference is significant. A workflow that costs you hundreds of thousands of tokens per day through Playwright because of screenshot processing costs almost nothing through Open CLI. The agent calls a tool, gets structured data back, and moves on. For any workflow that repeats the same browser automations daily, like checking prices, posting content, scraping leads, or monitoring competitors, this is the layer that makes it economically viable to run 24/7.
Claude Managed Agents: persistent sandboxes on Anthropic's servers
This is the one that changes the architecture. Anthropic shipped Managed Agents through their API, and I don't think most people have fully processed what this means for building agent systems.
A managed agent is a Claude instance running in a sandbox on Anthropic's servers. It has its own persistent file system. Files it creates stay there between sessions. You call it programmatically through the API, and the session persists. You can call it today, and it remembers what it did yesterday because the files are still there.
That alone would be useful, but the real power is what you can attach to it. You can create custom tools for your managed agent. A Telegram tool. A database connector. An Open CLI integration. Whatever your agent needs to interact with the outside world, you wire it up as a tool, and the agent calls it from inside the sandbox.
You can run as many managed agents as you want. Each one gets its own sandbox, its own memory, its own set of custom tools. They can communicate with each other programmatically. You can trigger them on schedules using cron jobs. You can have one agent that monitors incoming messages, another that does research, another that drafts responses, and a coordinator that orchestrates all of them.
The security story is what makes this viable for real client work. These agents run on Anthropic's infrastructure. For any workflow that handles sensitive data, customer information, financial records, internal documents, you now have a way to run persistent agents backed by Anthropic's security infrastructure instead of spinning up your own servers and hoping you got the security right.
This is essentially what Manus Cloud Computer is trying to be, except you control the architecture, you're not locked into one vendor's credit system, and you can swap components without rebuilding everything.
Ultrareview: code review without the bottleneck
The third piece is smaller but it solves a real problem in agent-built systems. Anthropic released Ultrareview as a Claude Code feature. It uses agents in a sandbox to do deep code reviews quickly.
Why this matters for agent workflows specifically: when you're building systems where agents write and modify code, you need a way to verify what they produce before it hits production. Human code review is the traditional answer, but it doesn't scale when agents are producing code continuously.
Ultrareview runs a deep review that catches issues a quick glance would miss. It's a premium feature, typically $5 to $20 per review depending on the size of the change. That's not cheap for every commit, but for the review before you push to production, it's a fraction of what a senior developer's time costs.
The practical workflow looks like this: your managed agent builds or modifies a system, Ultrareview checks the output, and only reviewed code gets deployed. It's not replacing human oversight entirely, but it's compressing the review cycle from hours to minutes for routine changes.
How these layers fit together
Here's how I'd architect a 24/7 agent system for a client today using these three pieces.
Open CLI handles all browser-facing automation. Checking prices, posting to social media, scraping data, monitoring competitors. Anything that involves interacting with a website on a schedule. This keeps token costs manageable because you're not burning context on screenshots.
Claude Managed Agents run the intelligence layer. One agent monitors incoming requests or triggers. Another does the actual work, whether that's generating content, analyzing data, or making decisions. A third handles responses, sending messages through Telegram, email, or whatever channel the client uses. Each runs in its own persistent sandbox with its own tools and memory.
Ultrareview gates any code changes before deployment. If your agents are modifying workflows, writing scripts, or updating configurations, the changes get reviewed before they go live.
Cron jobs tie it all together. Scheduled triggers call the managed agents at the right times. The agents call Open CLI tools when they need to interact with websites. Everything logs to persistent file systems in the sandboxes so you can audit what happened and when.
The total cost of this stack is your Anthropic API usage plus whatever infrastructure you run the cron scheduler on. No credit system with unpredictable burn rates. No single vendor that can go down and take your entire operation with it. If Anthropic's API has an issue, your cron jobs pause and retry. If one managed agent fails, the others keep running.
What this means if you're building agent workflows
If you're a developer building agent systems for clients, this stack gives you something you couldn't have six months ago: a way to run persistent, intelligent agents that interact with the web, handle sensitive data securely, and scale without the cost blowing up.
If you're a business owner who needs agent workflows but doesn't want to depend on Manus or any single platform, this is the architecture to ask for. It's modular. Every piece can be swapped. If something better than Open CLI comes along for browser automation, you swap that layer. If another LLM provider ships managed agents, you can migrate the intelligence layer. Nothing is locked in.
If you're exploring this space and trying to figure out what's real versus what's hype, here's the simplest test: can you run it for 30 days without babysitting it, and can you predict what it will cost? If the answer to both is yes, you have a production system. If either answer is no, you have a demo.
The tools to build real agent systems are here. The architecture patterns are proven. The question isn't whether 24/7 agents are possible anymore. It's whether the people building them understand the difference between a demo that works once and a system that runs every day.