Agent security: running untrusted code safely

In Part 2, I described an interactive agentic environment: OpenCode with MCP servers and skills. But what happens when agents run unattended? What about scheduled maintenance agents, automated code review bots, or ops agents that triage production issues?

An unattended AI agent that writes code, runs terminal commands, and accesses external APIs is an autonomous remote code execution engine with network access. In security terms, it's the nightmare scenario. Put your cloud credentials inside that agent, and one prompt injection later your production infrastructure is compromised.

Pattern 2: zero-secret architecture

The solution is a security model I call Pattern 2 (documented in nix/autonomous-agents-design.md):

The agent process holds zero secrets. A control plane (MCP broker) on the host holds every credential and mediates every privileged action. The agent can only do what the broker exposes as a narrow, audited tool.

┌───────────────────────── HOST (trusted) ─────────────────────────┐
│                                                                   │
│  MCP-tool BROKER (control plane) — holds ALL credentials          │
│    Tools: mysql_ro_query · cloudwatch_get_logs · sqs_peek_dlq     │
│           jira_read · jira_create_ticket · github_open_pr         │
│           write_handoff_doc                                       │
│    Audit log: every call logged (who/what/when/result)            │
│                                                                   │
│         ▲ Unix domain socket (bind-mounted into sandbox)          │
│         │                                                         │
│  Egress allowlist proxy — bedrock-runtime + registries/git only  │
│  Bedrock token minter — scoped AssumeRole → short-term API key   │
│  Supervisor (systemd --user) — spawn/scope/teardown per job      │
│                                                                   │
│   ┌──────────── gVisor sandbox (runsc, untrusted) ────────────┐  │
│   │  agent runtime + restricted tool manifest                 │  │
│   │  env: AWS_BEARER_TOKEN_BEDROCK (short-lived, scoped)      │  │
│   │  FS: one scoped bind-mount (code agent) / none (ops)      │  │
│   │  NET: closed except broker socket + allowlisted egress    │  │
│   └──────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

Threat model and mitigations

Vector	Mitigation
Prompt injection (from repo/Jira/Slack)	Manual-only triggers. Injected text is data, never a trigger. Restricted tool manifest limits blast radius.
Code-generation exploits	gVisor syscall isolation; ephemeral sandbox; no host FS beyond one scoped share.
Secret exfiltration	Zero secrets in sandbox. Only a short-lived, scoped Bedrock API key.
Context poisoning	Ephemeral per-job sandbox; no persistent agent memory across jobs.
Tool abuse	Broker enforces read-only + scope + per-action policy; high-risk actions hard-blocked.
Data exfil via network	Default-deny egress; allowlist = Bedrock runtime + package registries/git host.

The two agents

Two autonomous agents, each with a distinct capability matrix.

Code agent


Purpose	Code work on local repos + GitHub
Reads	One repo's source tree (bind-mounted)
Writes (allowed)	Repo edits, auto-commit, feature branch, open PR
Writes (forbidden)	Merge, push to `main`, force-push
Secrets held	None (Bedrock token only, scoped to `bedrock:InvokeModel`)
Network	Broker socket + Bedrock + registries/git host

Ops agent


Purpose	Triage: read Jira/Slack, query infra read-only, draft handoffs/tickets
Reads	Everything via broker (Jira, Slack, MySQL RDS RO, CloudWatch, SQS DLQs)
Writes (allowed)	Create Jira tickets + local handoff docs
Writes (forbidden)	Any Slack/email post, any DB write, any infra mutation
Secrets held	None (Bedrock token only)
Network	Broker socket + Bedrock only

The MCP broker: control plane

The broker exposes exactly 7 tools. Each has per-action enforcement:

Tool	Used by	Enforcement
`mysql_ro_query`	ops	Read-only DB user; statement allowlist (SELECT/SHOW/EXPLAIN); row/time caps
`cloudwatch_get_logs`	ops	Read-only IAM; scoped log groups
`sqs_peek_dlq`	ops	receive-without-delete (peek only); scoped DLQ ARNs
`jira_read`	ops	Read-only Jira token
`jira_create_ticket`	ops	Create-only; project allowlist; no transitions/comments to live channels
`github_open_pr`	code	Open-PR-only token; no merge, no push to `main`
`write_handoff_doc`	ops	Writes to a host review dir only

Hard-blocked everywhere: Slack post, email send, DB write, infra mutation, git merge, force-push, push to protected branches.

Every tool call is appended to a structured audit log: agent ID, tool name + params, timestamp, result (success/failure + truncated response). The broker refuses to serve calls that bypass the logger.

LLM inference: session-scoped Bedrock API key

The most important design decision here is how the sandboxed agent calls the LLM.

Rejected option: pass a long-lived API key through the broker. This would require the broker to proxy every LLM request or trust the agent with a key.

Selected approach: at job start, the host mints a short-term Amazon Bedrock API key and injects it into the sandbox as AWS_BEARER_TOKEN_BEDROCK. The key is crafted via a scoped AssumeRole session:

const { Credentials } = await sts.assumeRole({
  RoleArn: "arn:aws:iam::...:role/agent-bedrock-minter",
  RoleSessionName: `agent-${jobId}`,
  Policy: {
    "Version": "2012-10-17",
    "Statement": [{
      Effect: "Allow",
      Action: [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      Resource: [
        "arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-sonnet-4-*"
      ]
    }]
  },
  DurationSeconds: 43200  // 12 hours or job length, whichever shorter
});

Why it's safe: the key only authenticates Bedrock runtime. A leaked key cannot pivot to RDS, CloudWatch, SQS, or S3. The key lives for at most 12 hours, or however long the job runs. The session policy restricts to specific model ARNs, so the agent can't invoke arbitrary models. CloudWatch billing alarms and model-invocation logging cap token-cost DoS and prompt exfiltration.

Isolation layer: gVisor

The sandboxing technology is gVisor: a user-space kernel that intercepts every syscall from the agent process. Unlike Docker containers (which share the host kernel), gVisor provides a second kernel boundary:

docker run --runtime=runsc \
  --rm \
  -v /path/to/repo:/workspace:ro \
  -e AWS_BEARER_TOKEN_BEDROCK=$TOKEN \
  agent-image \
  run-task.sh

From inside the sandbox, ~/.aws, ~/.ssh, ~/.ginmon, and the host /home/ are not reachable. Egress is denied to everything except the allowlist. The Bedrock token cannot call a non-Bedrock AWS API (s3 ls returns denied). Forbidden tools are absent or blocked.

Verification checklist (before go-live)

□ Confirm ~/.aws, ~/.ssh, ~/.ginmon are NOT reachable from inside sandbox
□ Confirm only the scoped repo bind-mount is visible
□ Confirm egress to arbitrary hosts fails
□ Confirm Bedrock token cannot call S3/EC2/RDS
□ Confirm forbidden git commands are blocked
□ Confirm every broker call lands in the audit log

Job lifecycle

1. TRIGGER: Manual / scheduler only
   (No Jira/Slack polling — external text is data, never a trigger)

2. SPAWN: Supervisor mints scoped Bedrock token
   → starts fresh gVisor sandbox
   → bind-mounts scoped share
   → opens broker socket via bind-mounted Unix domain socket
   → closes all other egress

3. RUN: Agent works within restricted tool manifest
   → LLM calls go directly to Bedrock (no broker bottleneck)
   → All privileged actions go through broker (audited)

4. ARTIFACTS:
   Code agent: auto-commits to feature branch → opens PR (never merges)
   Ops agent:  auto-creates Jira tickets → writes handoff docs to review dir

5. TEARDOWN: Sandbox destroyed
   → Bedrock token expires
   → Outputs collected to audit log
   → Ephemeral: nothing to poison next run

The review boundary is the merge step (code) and ticket triage (ops), both human and out-of-band.

Phase 2: Firecracker microVMs

gVisor is Phase 1. The design document already describes Phase 2: Firecracker microVMs.

Firecracker is AWS's open-source VM manager (the technology behind Lambda and Fargate). Each agent runs in its own microVM with hardware virtualisation via KVM — stronger isolation than gVisor's user-space kernel. No shared kernel surface at all. Virtio-fs handles shared filesystems; vsock replaces the Unix domain socket for host-guest communication.

The transition from gVisor to Firecracker is already tested. The repo has a working Hermes VM configuration (nix/vms/hermes/) that proves the networking, storage, and lifecycle management on this hardware.

Deployment: NixOS systemd services

The entire autonomous agent system is declared in Nix:

# Planned in nix/home/autonomous-agents.nix
services.agent-supervisor = {
  enable = true;
  agents = {
    code = {
      sandbox = "runsc";          # or "firecracker" in Phase 2
      repo-bind = "/home/usman/repos/ginmon-backend";
      schedule = "daily 06:00";
    };
    ops = {
      sandbox = "runsc";
      schedule = "daily 07:00";
    };
  };
};

The supervisor runs as a systemd --user service, and the per-agent sandboxes are spawned as transient systemd scopes for resource accounting.

What Pattern 2 enables

This architecture is the result of years of layered infrastructure. Without Nix, reproducing the sandbox image, broker, and systemd units across machines would be impractical. Without the agentic stack from Part 2, the broker MCP protocol would be foreign. Without the security posture from Part 3, the sops-nix integration and audit logging would require separate infrastructure.

In Part 5, we'll look at what's next: the Hermes agent system, full NixOS switch, skill architecture improvements, and the longer-term vision.