Verdict: ADAPT (6/10)
Claude Managed Agents (beta, 2026-04-01) is Anthropic's fully-managed agent runtime. You define an agent (model + system prompt + tools + MCP + skills), an environment (container config), and launch sessions where Claude autonomously runs bash, edits files, searches the web, and connects to MCP servers — all in Anthropic-hosted cloud containers.
Bottom line: Well-designed infrastructure that overlaps with ~7K-8.5K LOC of Amprealize's agent infrastructure. Full adoption is premature (beta, vendor lock-in, model lock-in). The right play is extract specific patterns — the outcomes/grader system and memory stores — while keeping our differentiated agent intelligence.
Core Architecture
| Concept | Description |
|---|---|
| Agent | Reusable, versioned config (model, system prompt, tools, MCP servers, skills). Claude 4.5+ models including Opus 4.6. |
| Environment | Ubuntu 22.04 container. Up to 8GB RAM, 10GB disk. Python 3.12+, Node 20+, Go 1.22+, Rust, Java, Ruby, PHP, C/C++. |
| Session | Running agent instance. Stateful multi-turn. Files persistent within session. Isolated containers per session. |
| Events/Streaming | SSE-based. User → Agent → Session events. Server-side history persistence. |
Built-in Tools (agent_toolset_20260401)
bash, read, write, edit, glob, grep, web_fetch, web_search. Individually toggleable with permission policies (always_allow / always_ask).
MCP Integration
MCP servers declared at agent level. Vault-based auth with auto-refresh OAuth2 credentials.
Research Preview Features
- Outcomes/Grader (HIGH VALUE): Define rubric → agent works → separate grader evaluates in isolated context → iterates (max 20). Returns per-criterion breakdown. Deliverables to
/mnt/session/outputs/. - Multiagent: Coordinator + callable_agents. One level of delegation. Shared filesystem, isolated context per thread.
- Memory Stores: Persistent across sessions, workspace-scoped. Auto-read before tasks, auto-write learnings. CRUD + version history + redaction. 100KB per memory, optimistic concurrency via SHA256.
Scores
| Dimension | Score | Notes |
|---|---|---|
| Relevance | 8/10 | Directly addresses agent execution — our core domain |
| Feasibility | 6/10 | Beta API, no pricing, vendor lock-in concerns |
| Novelty | 7/10 | Outcomes/grader is novel; rest is expected evolution |
| ROI | 5/10 | High integration cost for modest gain over existing infra |
| Safety | 7/10 | Sandboxed containers good; data-leaving-infra is a concern |
| Overall | 6/10 |
What to Extract for Amprealize
1. Outcomes/Grader Pattern (HIGH)
Separate-context evaluator grades agent work against a rubric. Directly applicable to compliance review and behavior validation. Prototype using existing LLM infra — strengthen compliance_service.py and agent_review_service.py. Tracked under GUIDEAI-896.
2. Memory Stores Architecture (MEDIUM)
Versioned, path-based memory with optimistic concurrency and audit trail. Consider whether WikiService should adopt: SHA-based conflict detection, memory versioning/redaction, auto-read/auto-write agent behavior.
3. Permission Policy Model (LOW-MEDIUM)
always_allow / always_ask gating with per-tool overrides. Cleaner than current ExecutionPolicy. Worth abstracting into ToolExecutor.
What NOT to Adopt
- The agent loop itself — Our GEP phases, behavior adherence tracking, and multi-surface parity are more sophisticated.
- Container execution — BreakerAmp for test infra; Anthropic-hosted containers are a non-starter for enterprise.
- MCP vault system — We have auth_tokens.py and agent_auth.py.
Competitive Landscape
| Platform | Type | Maturity | Key Differentiator |
|---|---|---|---|
| Claude Managed Agents | Managed runtime | Beta (2026-04) | Outcomes/grader, built-in caching |
| OpenAI Assistants API | Managed runtime | GA (v2) | File search, code interpreter sandbox |
| Google Vertex AI Agent Builder | Agent platform | GA | Multi-modal grounding, GCP integration |
| AWS Bedrock Agents | Managed runtime | GA | Multi-model, knowledge bases with RAG |
| LangGraph Cloud | Orchestration | GA | Model-agnostic, stateful graph execution |
| CrewAI | Multi-agent framework | Stable OSS | Python-native, role-based agents |
| Amprealize | Custom platform | Internal | GEP phases, behaviors, compliance, parity testing |
Related Work Items
- GUIDEAI-895: Add Session Mode for lightweight agent execution (goal)
- GUIDEAI-896: Spike: Prototype outcomes/grader pattern for compliance reviews
- GUIDEAI-869: Implement always-on persistent agent infrastructure
- GUIDEAI-718: Adopt OpenClaw production patterns for multi-agent operations