Zero Trust for AI Agents: Comparing Anthropic and Nvidia’s Security Architectures

The rapid deployment of AI agents across the enterprise is creating a critical security vacuum. While organizations are racing to integrate these autonomous tools, the underlying architecture often leaves sensitive credentials in the same environment as untrusted, AI-generated code. This “monolithic” approach means that a single prompt injection can potentially expose an entire suite of API keys and OAuth tokens, expanding the blast radius from a single agent to every connected corporate service.

The urgency of this issue dominated the conversation at the RSA Conference (RSAC) 2026. Industry leaders from Microsoft, Cisco, CrowdStrike, and Splunk all converged on a similar conclusion: the traditional perimeter is insufficient for agentic AI. The industry is seeing a shift where 79% of organizations already utilize AI agents, according to PwC’s 2025 AI Agent Survey, yet only 14.4% of those organizations report full security approval for their agent fleets, per the Gravitee State of AI Agent Security 2026 report.

This gap between deployment speed and security readiness is being described as a “governance emergency.” According to a survey presented at RSAC by the Cloud Security Alliance (CSA), only 26% of organizations have established AI governance policies. The danger is not theoretical; the “ClawHavoc” supply chain campaign, which targeted the OpenClaw agentic framework, saw 1,184 malicious skills tied to 12 publisher accounts, as confirmed by Antiy CERT and highlighted in the CrowdStrike 2026 Global Threat Report.

To address these vulnerabilities, two distinct zero-trust agent architectures have emerged from Anthropic, and Nvidia. By analyzing how these systems handle credential proximity and execution, security teams can determine where the blast radius actually stops and how to move away from the high-risk monolithic pattern.

The Monolithic Risk: Credentials and Code in One Box

Most enterprise AI agents currently operate within a monolithic container. In this pattern, the AI model’s reasoning, the tool-calling mechanism, the execution of generated code, and the storage of credentials all occur within a single process. Because every component trusts every other component, sensitive assets like Git credentials and API keys sit in the same memory space where the agent executes code it may have written only seconds prior.

This architecture creates a critical vulnerability to prompt injection. If an attacker can influence the agent’s reasoning, they can potentially exfiltrate tokens or spawn unauthorized sessions. The CSA and Aembit survey of 228 IT and security professionals found that 43% of organizations use shared service accounts for agents, and 68% cannot distinguish agent activity from human activity in their logs.

CrowdStrike CTO Elia Zaitsev has noted that securing these agents is similar to securing highly privileged users, as they possess identities and access to underlying systems. Because there is no “silver bullet” solution, a defense-in-depth strategy is required to prevent the total compromise of the container and all connected services.

Anthropic’s Approach: Separating the Brain from the Hands

Launched in public beta on April 8, Anthropic’s Managed Agents architecture seeks to eliminate the monolithic problem by splitting the agent into three mutually distrustful components: the “brain” (Claude and the routing harness), the “hands” (disposable Linux containers for code execution), and a “session” (an external, append-only event log).

View this post on Instagram

The core security innovation here is the structural removal of credentials from the execution sandbox. Anthropic stores OAuth tokens in an external vault. When an agent needs to use a Model Context Protocol (MCP) tool, it sends a session-bound token to a dedicated proxy. This proxy fetches the actual credentials from the vault, executes the call, and returns only the result. The agent never sees the actual token. Similarly, Git tokens are wired into the local remote during sandbox initialization, allowing push and pull actions without the agent ever touching the credential.

This separation also provides a performance benefit. By decoupling the brain from the hands, inference can commence before the container boots, which Anthropic reports has dropped the median time to first token by roughly 60% per their engineering documentation. Because the session log exists outside the brain and hands, the system offers higher durability; if a harness crashes, a new one can boot, read the log, and resume the task without state loss.

Nvidia’s Approach: Layered Enforcement and Intent Verification

Nvidia’s NemoClaw, released in early preview on March 16, takes a different path. Rather than separating the agent from its execution environment, NemoClaw wraps the entire agent in five stacked security layers to monitor and restrict every single movement.

The architecture employs kernel-level isolation using Landlock, seccomp, and network namespace isolation. It implements a “default-deny” outbound networking policy, requiring explicit operator approval via YAML-based policies for any external connection. To prevent data leakage and reduce costs, a privacy router directs sensitive queries to locally running Nemotron models.

The most critical layer for security teams is “intent verification.” Using the OpenShell policy engine, NemoClaw intercepts every proposed action before it touches the host. While this provides immense runtime visibility via a real-time Terminal User Interface (TUI), it introduces a significant operational cost: operator load scales linearly with agent activity, as every new endpoint requires manual approval. NemoClaw lacks an external session recovery mechanism; if the sandbox fails, the state is lost.

Analyzing the Credential Proximity Gap

The fundamental difference between these two architectures is the proximity of credentials to the execution environment. In the Anthropic model, credentials are structurally removed from the blast radius. A successful prompt injection may compromise a disposable container, but the attacker finds no tokens to steal. Exfiltrating credentials would require a “two-hop” attack: first influencing the brain’s reasoning and then convincing it to act through a container that holds nothing of value.

In contrast, NemoClaw uses policy-gating rather than structural removal. While inference API keys are proxied through the privacy router, other integration tokens (such as those for Slack, Discord, or Telegram) are injected into the sandbox as runtime environment variables. So the agent and the generated code share the same sandbox, and the credentials sit alongside the execution.

This distinction is vital when considering “indirect prompt injection,” where an adversary embeds malicious instructions in a web page or API response that the agent queries. In the NemoClaw architecture, these injected instructions enter the reasoning chain with proximity to execution. While the intent verification layer can catch a malicious *action*, it may not catch malicious *data* being returned. Anthropic’s architecture limits the blast radius of such an injection because the reasoning process remains isolated from the credential vault.

Comparison of Zero-Trust Agent Architectures

Comparison of Agent Security Frameworks
Feature	Anthropic Managed Agents	Nvidia NemoClaw	Monolithic Default
Credential Location	External Vault / Proxy	Policy-Gated / Env Vars	Shared Process Memory
Execution Environment	Disposable Linux Containers	Layered Sandbox (Landlock/seccomp)	Single Monolithic Container
State Durability	External Session Log	Internal to Sandbox	Internal to Container
Observability	Console Tracing	Real-time TUI / Operator-in-loop	Basic Application Logs
Primary Risk	Reasoning Manipulation	Operator Scaling / Data Leakage	Single-hop Exfiltration

Implementing a Zero-Trust Audit for AI Agents

For security directors and architects, the move away from monolithic agents should be a priority. David Brauchler, Technical Director and Head of AI/ML Security at NCC Group, advocates for “gated agent architectures” based on trust segmentation, where AI systems inherit the trust level of the data they process.

Organizations evaluating their agentic fleet should focus on five priorities:

Audit for Monolithic Patterns: Identify any agent that holds OAuth tokens or API keys directly within its execution environment. Priority should be given to those using shared service accounts.
Isolate Credentials: When reviewing RFPs for agent deployment, specify whether credentials are removed structurally (via vault/proxy) or merely gated by policy.
Verify Session Recovery: Test the impact of a sandbox failure mid-task to determine if long-horizon work carries a high risk of data loss.
Plan for Observability Staffing: Determine if the organization can support an “operator-in-the-loop” model (like NemoClaw’s TUI) or requires integrated tracing (like Anthropic’s console).
Address Indirect Injection: Require vendor roadmaps that specifically address the gap between catching malicious actions and catching malicious returned data.

The shift toward zero-trust for AI agents is no longer a theoretical exercise. With the release of these frameworks, the industry has a baseline for reducing the blast radius of autonomous systems. The gap between deployment velocity and security approval remains the most likely origin of the next generation of enterprise breaches.

Industry stakeholders continue to monitor the development of these standards, with NIST having released a concept paper to anchor future agent identity frameworks as reported following RSAC 2026. Further updates on agentic identity registries from providers like Microsoft, CrowdStrike, and Cisco are expected as these frameworks move from beta to general availability.

We invite security professionals and AI architects to share their experiences with agentic governance in the comments below.

Zero Trust for AI Agents: Comparing Anthropic and Nvidia’s Security Architectures

The Monolithic Risk: Credentials and Code in One Box

Anthropic’s Approach: Separating the Brain from the Hands

Nvidia’s Approach: Layered Enforcement and Intent Verification

Analyzing the Credential Proximity Gap

Comparison of Zero-Trust Agent Architectures

Implementing a Zero-Trust Audit for AI Agents

Related

Leave a Comment Cancel reply

The Monolithic Risk: Credentials and Code in One Box

Anthropic’s Approach: Separating the Brain from the Hands

Nvidia’s Approach: Layered Enforcement and Intent Verification

Analyzing the Credential Proximity Gap

Comparison of Zero-Trust Agent Architectures

Implementing a Zero-Trust Audit for AI Agents

Share this:

Related

Leave a Comment Cancel reply